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THE ACTIVITY MOVEMENT 


By Crype Hissonc 








In an attempt to overcome the weaknesses of the traditional school 
organization many progressive schools have developed new programs. 
These programs are so similar in character that collectively the 
changes have been referred to as the activity movement. This 
movement has claimed the center of the educational stage for a length 
of time sufficient to have engendered widespread interest in its out- 
comes and in its basic philosophy. 

In Doctor Hissong’s study an attempt has been made to discover 
the principles underlying the present activity movement, to determine 
the influence of traditional concepts in shaping the trends of the 
movement, and to see if in the light of the present knowledge of the 
child and his relation to his environment the movement rests upon 4 


justifiable basis. 
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DOES COLLEGE TRAINING INFLUENCE TEST 
INTELLIGENCE? 


L. D. HARTSON 
Oberlin College 


The scores made in an intelligence test by Oberlin seniors have 
been compared with their freshman scores. The test is the Ohio State 
University Psychological Examination. Two classes, 1930 and 1933, 
totaling one hundred seventy-six men and two hundred twenty-seven 
women, were the subjects. Their M age was 18 years., 2.25 mos. 
(+12.8 mos.). As freshmen the 1930 class was tested by Form 9 or 
10, and as seniors by Form 15 or 16; the 1933 class took Form 15 or 16 
as freshmen and Form 13 as seniors. Forms 15 and 16 are revisions of 
the material used in Forms9and 10. There is therefore the possibility 
that some specific items of information were retained. After taking the 
test once as freshmen, those in the class of 1930 may have ascertained 
the correct answers to some of the troublesome questions and remem- 
bered them thereafter. However, the retesting of the second class 
constitutes a “‘control series,’’ since Form 13, used with 1933 when 
seniors, is composed entirely of items that differ from those in Forms 15 
and 16. ‘There is a slight difference (.60 centile points) in favor of the 
class of 1930 in the amount of improvement between freshman and 
senior year. As this is less than six per cent of the M gain shown by the 
two classes, it seems fair to conclude that the learning of specific items 
of the freshman test constitutes but a small portion of the gain made. 

The examination has five parts: Test 1 (Arithmetic) ; 2 (Synonyms- 
Antonyms Vocabulary); 3 (Verbal Analogies); 4 (Number Series); 5 
(Reading Comprehension). Scores were transmuted into centiles, 
using the State freshman norms, in order to equate the scales for varia- 
tions in difficulty. Tables I and II! present the essential data for com- 





1In Table II, where the amount of gain reported shows a tie, there was an 
actual difference in the second decimal column. 
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paring freshman with senior scores when the population is divided 
according to major groups and according to sex. Scores are reported 
both for ‘“‘rights” and for ‘“‘accuracy,” accuracy here indicating the 
ratio of rights to attempts. The accuracy figures here employed are 
centiles based upon norms derived from five successive class groups, a 
total of seventeen hundred twenty-six students. 


RESULTS IN TERMS OF RIGHTS SCORES 


The basic finding of this study is that there is a significant improve- 
ment in the total scores. The M rights score for the four hundred three 
students as freshmen was seventy-three, as seniors it was eighty-four. 
For this eleven point difference the critical ratio is 7.4. As a matter of 
fact, the improvement was greater than this, for one hundred thirty- 
six, or more than one-third of the students, made initial scores of ninety 
or more. For this large group the scale limit of one hundred masks 
gains greater than ten points. Of these, nine made scores of one hun- 
dred in the first test. The forty-two whose senior score was one hun- 
dred averaged fifteen more than the minimum necessary to score in the 
highest centile. Again, the gain in total score was made in the face of 
a loss of 1.4 in Test 1 (Arithmetic). Table II shows losses in the M 
rights scores of seven groups, wiz: Economics, 1.7; English, 3.1; 
History, 6.2; Languages, 6.8; Music and Fine Arts, 5.9; Physical 
Education, 2; and Sociology, .5. With the exception of the groups 
in Economics and History, the preponderant sex in these majors is 
female. As half of the History majors and ninety per cent of those in 
Economics are men, however, this failure to improve in arithmetical 
intelligence is evidently not a sex-limited trait. 

Here then is evidence of real improvement in intelligence test 
scores, which is not a result of the factor of selection, inasmuch as the 
students constituting the two groups are identical. The subjects of our 
study are, to be sure, selected; selected in two ways: (1) They were as 
freshmen superior to the general run of students, their M score being 
nearly the equivalent of the Q; of Ohio freshmen; (2) they are a superior 
sample of these superior freshmen. The M score of the original total of 
seven hundred in the two classes is 68 + 25.6. The M score for the 
two hundred ninety-seven who fell behind or transferred from Oberlin 
is 62 + 27. Comparing this with seventy-three, which represents the 
M score of those who persisted, shows a difference in favor of those who 
remained in Oberlin of eleven (critical ratio of difference: 5.7). 
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DOES EITHER SEX SHOW CONSISTENT SUPERIORTY? 


There are no significant differences between the total scores of the 
men and the women. The men have a freshmen superiority of 1.11, 
which is reduced to .38 by the senior year. As usual, however, the 
totals smooth out the differences which exist in the independent test 
scores. As has been found year after year at Oberlin, men surpass 
women in tests employing numbers, and in this study they are some- 
what superior in the Reading Comprehension. In Arithmetic the men 
increased their initial lead of thirteen (critical ratio of difference: 5.5) 
to twenty-one (critical ratio: 8.1). In Number Series their initial lead 
of thirteen (critical ratio: 4.7) although reduced to ten (critical ratio: 
3.8) is unmistakable. On the other hand, the women have the better 
record in the vocabulary tests, and one which becomes somewhat more 
marked by the senior year. In Synonyms-Antonyms a difference of 
but .2 (critical ratio: .08) is increased to one (critical ratio: .61), and in 
Verbal Analogies an initial superiority of five (critical ratio: 2.4) is 
changed to six (critical ratio: 2.9). The figures for accuracy, while 
paralleling those for rights, are less reliable. 

In general, these sex differences are reflected in the choice of major 
subjects. The groups consisting predominantly of women are rela- 
tively poor in the tests employing numbers (Sociology, Physical 
Education, Music-Fine Arts, Languages, Philosophy-Bible and Eng- 
lish), whereas the groups dominated by men did relatively well in these 
tests (Economics, Physical Science, and Political Science). The 
exception is found with the mathematicians, half of whom are women. 
The test in which the women are reliably superior (Verbal Analogies) 
is that in which the leading positions are held by English and Languages, 
which contain ninety-three women and but thirty-six men. As the 
test differences associated with subject-matter are more significant than 
those with sex, the latter have been disregarded in making the compari- 
son between major groups. 


RELATIVE IMPROVEMENT AT THE DIFFERENT SCORE LEVELS 


Do the originally more capable improve more than the less capable? 
In Table III are presented data which show the comparable gains made 
by students of different test levels. The greatest improvement is 
found with those who originally scored in the 21-30 decile, though their 
gain was not materially greater than that of the lowest decile group. 
Above the third decile the curve slopes down rather uniformly from 
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TaBLE III.—MgraAN GAINS MADE BY STUDENTS GROUPED BY DECILES ACCORDING 
TO FRESHMAN TOTAL SCORE, WITH SIGMAS 











Gain 
N 
Mean Sigma 
91-100 108 1.85 2.86 
81- 90 66 6.68 6.45 
71- 80 60 11.03 8.88 
61- 70 © 45 15.08 12.55 
51- 60 32 20 .37 12.50 
41-— 50 24 21.00 15.20 
31- 40 20 25 .70 15.70 
21- 30 14 33 .21 15.13 
11- 20 6 23 .83 17 .26 
1- 10 8 31.75 11.20 
I ae iiceiik inte been 383 














33.21 to 1.85 points of improvement. It is clear that those who gained 
most are those who originally scored low. Two observations may be 
made: (1) The limitations of the test interfered with the achievement of 
great gains on the part of those originally scoring high; (2) Even the 
least capable freshmen were not so intellectually mature as to be incap- 
able of making remarkable improvement in test intelligence after the 
age of eighteen. 


COLLEGE TRAINING AND VARIETIES OF TEST INTELLIGENCE 


The fact being established that the test intelligence of these students 
improved during three years spent in college, one wonders whether 
scores in the different varieties of test reflect the training associated 
with one’s field of specialization. The examination includes but two 
contrasting forms of material. Tests 1 and 4 require computations in 
numerical symbols; 2 and 3 test verbal relationships. Inspection of the 
tabulated data shows that it is the students who majored in the 
Mathematics-Science division who proved to be most apt in the numer- 
ical computations, and it is those who majored in the Language- 
Literature groups who show greatest ability in handling verbal 
relationships. These are the only major groups in which test scores 
appear to be related definitely to the subject-matter of their courses. 

No consistent contrasts between the major groups appear in the 
data pertaining to the test of Reading Comprehension. This is 
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probably due to the fact that the content of the test represents most of 
the types of subject-matter found in the liberal arts curriculum. It is 
possible that if separate scores were computed, e.g., for the paragraphs 
dealing with physical science, it might be found that those majoring 
in this field had made higher scores than did the other students. Thus 
far, no such dissection of the test has been made. No further reference 
will be made in this report, therefore, to this test. 


1. Mathematical-scientific Training Influences Scores in Arithmetical 
Tests 


More striking than the good freshman showing of students who later 
chose to major in Mathematics or Science, is their improved status, 
after three years’ training, in the tests employing numerical symbols. 
These students proved to be more capable in Arithmetic, whereas the 
students in the English and Language groups were less intelligent in 
this field as seniors than when they entered college. Moreover, the 
reliability of the superiority of the mathematicians and scientists is 
greater in the senior than in the freshman year. The M’s of the 
critical ratios involving the cases of superiority rose from 1.34, 2.59 
and 3.27 in the freshman year to 3.02, 3.72 and 3.82, by the senior year, 
for the biologists, mathematicians and physical scientists, respectively. 
The largest loss in Arithmetic was made by the majors in Languages, 
and the M of the critical ratios representing the eight instances in 
which this group made a lower score increased from 2.40 to 3.51. 

In the Number Series the mathematicians rose from fourth to first 
position, with a gain of 11.5 points, the biologists from ninth to fifth, 
gaining 15.3. Although the M gain made by the physical scientists is 
small (3.0) this was made from the highest original average; thirty-one 
per cent of their freshman scores were above ninety. In the final 
ranking they lost the lead only to Mathematics and increased the 
reliability of their superiority over five of the other groups from a M of 
1.75 to 2.08. 


2. Linguistic Training Reflected in Vocabulary Test Scores 


In both vocabulary tests the highest scores, initial and final, were 
made by the majors in the foreign languages, although they were but 
ninth in Arithmetic and sixth in Number Series. A similar contrast is 
found in the case of the English majors. Second in the vocabulary 
tests, they are seventh in the arithmetic tests. On the other hand, the 
mathematicians, who rose to first in the tests which employ arithmetical 





i 
i 
4 


488 The Journal of Educational Psychology 


symbols, dropped to eleventh from eighth in Synonyms-Antonyms, and 
‘from fifth to eleventh in Verbal Analogies. In the latter test they made 
the least gain of any of the twelve groups. The scientists, to be sure, 
made greater gains in vocabulary than did the language groups, but this 
is due in great degree to the test limitations. (Thirty per cent of the 
foreign language majors scored above ninety as freshmen, and only 
thirty-two per cent scored below that level as seniors in Verbal 
Analogies.) 


ARE SENIORS MORE ACCURATE THAN FRESHMEN? 


The improvement in accuracy is even more marked than the 
improvement registered in right answers. Mean freshman accuracy 
for these two classes is represented by a centile score of 54.9 (fifty 
denoting the median for five successive classes). By their senior year 
their M accuracy rose to 71.6, a gain of 16.7. The women made a 
substantially greater gain than did the men, 18.8 points to their 14. 
These data might seem to indicate that the women are habitually more 
careful than their male classmates. This, however, was not true at the 
time they entered college, for as freshmen the M score for the men was 
57.3, whereas that for the women was 53.1. The chief reason why the 
women’s score is higher than that of the men in the senior year appears 
to be that twenty-seven per cent of the men, as compared to but nine- 
teen per cent of the women, made no gain over their freshmen record in 
accuracy. Of the ninety students whose senior scores in accuracy 
showed no gain, there were but seven who were less accurate in all 
parts of the test. This suggests, therefore, that the attitude with 
which the seniors took the examination as a whole was not, to any very 
marked degree, characterized by carelessness. Most frequent losses 
in accuracy occurred in the Arithmetic test; the best record was made 
in Verbal Analogies. 

As was noted ‘in the case of the rights scores, the gains in accuracy 
would have been greater if there had not been a loss of accuracy in the 
arithmetical tests. In Arithmetic this amounts to 5.4 and in Number 
Series to 1.4. If the examination had omitted these two tests involving 
numerical symbols, the total gain, both in rights scores and in accuracy, 
would, therefore, have been greater than that which actually occurred. 

After confirming the results obtained with the rights scores, the 
data on the subtests reveal significant contrasts in the records of the 
major groups. In general, the same tendencies appear in the accuracy 
series as in the rights comparisons. By some coincidence the same 
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coefficient of .741 is found when the correlations are computed between 
(a) rank order of groups in total rights and total accuracy in the 
freshmen data and (b) the same comparisons for the senior data. For 
four of the tests, however, the relationship between rights and accuracy 
is closer in the senior year, the exception being the Number Series, in 
which R dropped from .839 to .650. For Reading Comprehension there 
is an increase in R from .217 to .783. 

(1) In Arithmetic the contrasts between the Mathematics-Science 
and the Language-English group are even more marked than is true of 
the rights scores. In both types of score the former held the lead which 
they took when freshmen, whereas both the Language and English 
groups made decided losses in accuracy. The English majors, with a 
loss of 10.8, dropped from fifth to ninth rank, while the Language group 
dropped from sixth to twelfth position by a loss of 14.8, a larger loss 
than made by any other group. Two-thirds of the Language and 
English majors were more accurate in the freshman than in the senior 
test in Arithmetic. 

(2) Inthe Number Series, in similar fashion, Mathematics, Physical 
Science (and Economics) continued in high rank in both freshman and 
senior year, whereas the Language group dropped from seventh to 
eighth, and English from sixth to tenth, making the largest loss of any 
group. 

(3) On the other hand, the groups which made such marked losses 
in Arithmetic and Number Series, are the outstanding leaders in the 
vocabulary tests. In both of these tests, the Language majors show 
marked superiority. In Synonyms-Antonyms they rose from sixth to 
first, by a gain of 21.7, and in Verbal Analogies from second to first, by 
a gain of 25.1 points. The record of the English majors is not quite so 
consistent, for they dropped from second to fourth in Synonyms- 
Antonyms. However, they rose from third to second in Verbal 
Analogies, by a gain of 20.3. 

To be sure, all of the groups had better records for accuracy in the 
senior year in the vocabulary tests. Comparisons have to be made 
therefore in terms of relative gains. In this respect the contrast 
between the mathematicians and the language majors is very clear. In 
Synonyms-Antonyms the mathematicians dropped from tenth to 
eleventh, and in Verbal Analogies they failed to rise above the eleventh 
position. The biologists made substantial gains in both tests and 
improved their relative position from eighth to sixth, and from ninth to 
seventh, but the physical scientists made the least gain of any group in 
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Verbal Analogies, with the result that, whereas they were in first posi- 
tion when freshmen, they were but ninth in the senior comparison. 


SUMMARY OF MORE SIGNIFICANT FINDINGS 


Equivalent forms of an intelligence examination were administered 
to four hundred three seniors three years after their freshman test. 

1. The senior scores show gains over freshman scores, which, in 
spite of the fact that the test was too easy for at least one-third of the 
students, amounts to a M for the rights of about 11 centile points, with 
a critical ratio of 7.4; and for accuracy of 17, with a critical ratio of 8.7. 

2. The greatest gains were made by those whose freshman rights 
scores were in or below the 21-30 decile. Above this level there is a 
negative correlation between initial score and amount of improvement. 
This is probably due, in large measure, to the fact that the examination 
was too easy for the more capable freshmen. Those to whom the test 
offered the greatest opportunity for improvement (the three lowest 
deciles) gained, on the average, over thirty points. 

3. Of greater significance than the increase in total score is the 
discovery of a definite relationship between the nature of the tests 
in which improvement occurs and the character of the courses in which 
one has specialized. A sharp contrast exists between the English and 
Foreign Language majors and those in the Mathematics-Science groups. 
The former excel in the vocabulary tests and the latter in numerical 
computations. The initial contrast is sharpened by the senior year, 
particularly by the fact that, while the Mathematics-Science groups 
continue growing, the English-Language majors are less intelligent in 
handling numerical symbols than when they entered college. 


GENERAL CONCLUSION 


The most important generalization to be drawn from this study is 
that improvement in intellectual abilities continues during the college 
period in the fields in which that experience provides exercise. Con- 
tinued maturation is not a general process of unfoldment of inner 
capacities, which occurs independently of particular training. In 
abilities required for numerical computations a large proportion of 
these students were more intelligent when they entered college than 
when they graduated. Is it not reasonable to suppose that this is 
largely due to the fact that memory for the technics used in arithmetical 
computations has dimmed from lack of practice? When confronted 
with problems which one only half remembers how to solve, and yet 
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realizes that he once solved, and is working under the pressure imposed 
by time limitations, one is more apt to become confused, as is indicated 
by the increased amount of inaccuracy characterizing most of the 
seniors in this field. On this hypothesis, it seems probable that the 
practice of employing considerable amounts of mathematics in intel- 
ligence tests, may, in part, explain why investigators have frequently 
found no improvement in intelligence scores beyond the age for 
graduating from high school. Professor and Mrs. Miles, for example, 
who employed an abridgement of the Otis Self-Administering Test, 
the items of which are nearly one-fourth mathematical, have concluded. 
“From the high point in the intelligence score curve represented at 
18 years of age the trend is at first almost level, then gradually 
declines.’’! 

Our study shows, on the other hand, that with reference to those 
types of intellectual performance in which their work stimulated 
further development, namely, in the verbal relationships, which are 
commonly recognized as the best indices of intelligence, the ‘‘age of 
maturity” had not been attained at eighteen. It seems fair to con- 
clude, in fact, that growth in such intelligence as is measured by the 


vocabulary tests, may continue for some years beyond that of college 
matriculation. 





1 Miles, C. C. and W. R.: ‘The Correlation of Intelligence Scores and Chrono- 


logical Age from Early to Late Maturity.’”’ American Journal of Psychology, 
Vol. XLIV, 1932, p. 77. 





AN EXPERIMENTAL STUDY OF THE TRANSFER OF 
TRAINING WITH SPECIAL ATTENTION TO THE 
RELATION OF INTELLIGENCE TEST PERFORMANCE 


DAVID G. RYANS 
Eveleth Junior College, Eveleth, Minnesota 


Following the accepted definition, we generally speak of intellectual 
ability as though it were best demonstrated in situations which demand 
the capable retention and transference of previously learned materials 
and their application to an immediate problem. Theoretically, it 
seems that no process could be more important for what we call intel- 
ligent behavior than positive transferability. Every problem requires 
reference to the preceding experiences of the individual and judgment 
in light of those experiences. Every creative task involves the acquisi- 
tion and retention of an enormous number of pertinent facts and the 
reorganization of the same. In fact, even routine procedures are not 
entirely exempt from this emphasis upon transfer. So it is rather 
natural that one should observe and indicate the fundamental character 
of transference with regard to intellectuality. 

Proceeding on this assumption that the ability to transfer and 
brightness are all but synonymous, the builders of intelligence tests 
have constructed measures of retentive and reasoning abilities in the 
light of the more or less common experiences of the individuals of a 
given culture. But adequate analysis of the ‘“‘something’’ measured 
by these instruments has lagged far behind the wholesale development 
and production of tests. It was with this last fact in mind that the 
present study was undertaken. 

The existence of a significant correlation between learning perform- 
ance and intelligence or brightness, both in animal and human sub- 
jects, is not often questioned. And, as heretofore stated, ability to 
transfer is accepted as a requisite of intelligent behavior and adjust- 
ment. From a logical point of view, then, transference, learning, and 
intelligence should be closely related. To the contrary, rats which were 
most efficient in an original maze situation were found by Rockwell 
(52, pp. 1-201)* to be surpassed by their thyrodectomized (and by anal- 
ogy to human clinical cases, mentally deficient) litter-mates in subse- 
quent learning involving similarity of response. Brighter rats seemed 
to show a greater amount of negative transfer, to do less well on a second 
problem, than dull rats. 





* The number refers to that of the reference in the bibliography. 
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The investigation herein reported attempted a somewhat similar 
study of human subjects in an effort to discover whether or not like 
tendencies might be apparent and demonstrable. By use of intelli- 
gence test data in conjunction with transfer indices, the aim was to 
determine the relationship, if any, existing between the function ’ 
operative in negative transfer, habit interference, associative inhibition, 
etc. and intellectual ability as measured by our instruments. 


LITERATURE 


The educational significance of transfer of training has led to 
extensive experimentation in the various fields of mental and sensori- 
motor functioning. Mention of the classical studies would seem trite. 
Whipple’s review in the T'wenty-seventh Yearbook of the National Society 
for the Study of Education (51, pp. 179-209) will suffice to cover that 
material. 

Whipple presents an inclusive summary of the results obtained by 
James, Peterson, Ebert and Meumann, Dearborn, Fracker, Winch, 
Sleight, and Woodrow in studies dealing with memory; Thorndike and 
Woodworth, Judd, Coover and Angell, Whipple, Foster, Dallenbach, 
Ruger, and Wang with perception, discrimination, and apprehension; 
and Bagley, Squire, Ruediger, Winch, Starch, Wallin, Briggs, Hewins, 
Rugg, and Thorndike with school-room activities and attitudes. In 
spite of the lack of adequately reliable data the studies lead one to 
conclude that transfer of training probably takes place with regard to 
most activities, and that this transfer may be either positive (training 
in one activity facilitating learning of another) or negative (training in 
one activity hindering or interfering with the learning of another). 
Further, the amount of transfer seems to be a function of the task, 
the individual, and the experimental method and interpretation. It is 
assumed that transfer depends largely upon native mental ability 
since high intelligence is characterized by ‘“‘generalized reactions”’ 
and dullness by the absence of them. 

McGeoch, McDonald, Bergstrom, Culler, Kline and Krasnopolsky, 
Jersild, Alm, Dashiell, Carroll, Archer, Cole, James, Jones, Spearman, 
and others have reported investigations of the conditions and measure- 
ment of habit interference, negative transfer, and perseverance. 

In the field of comparative psychology Dashiell, Ho, Jackson, 
Wiltbank, Wylie, and Webb have attacked the problem of transfer of 
training in the maze-learning of rats. Webb also made comparisons of 
rats and human subjects in transferability on the maze and observed 
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striking similarities of performance. Evidences of both positive and 
negative transfer have been obtained in proportions varying with the 
nature and complexity of the maze problem. Dashiell (10, p. 330) 
offers an excellent summary of the earlier work. Alm, Dashiell, 
Fritz, Haney, Hunter, Pearce, Rockwell, »\.d others have dealt espe- 
cially with interference in the rat’s auditory, visual, and maze habits. 

The literature yields inconclusive evidence regarding transfer of 
training. It seems probable that the learning of one function may 
affect subsequent learning of another ezther positively or negatively. 
Certain activities do seem to interfere with the performance of others 
and the interference appears to be rather closely related to the similarity 
of the materials and the degree of automaticity of the activity. The 
relationship between brightness and transfer of training seems to be 
slight and variable with respect to the problem. 


PROCEDURE 


Numerous transfer problems were employed at the beginning of 
the investigation in an attempt to discover one that would yield nega- 
tive as well as positive transfer of training. The most successful instru- 
ment proved to be a letter-digit substitution exercise involving twelve 
one-minute learning periods distributed with fifteen-second rest periods 
between learnings. 

Test I required the learning of a key (H-1, F-2, K-3, Z-4, M-5) and 
the substitution of digits for the corresponding letter on the test page. 
Test II was similar, except that the key was changed (H-4, F-5, K-2, 
Z-1, M-3). Test II was given five minutes after the completion of 
Test I. The materials were mimeographed and the arrangement of 
letters to be substituted for was the same on each page. 

Sample sections of the tests are shown below: 


Test I 
Key 
HFKZM 
12345 
FKMZHMKZHFKZMFHKFHMZMZFHK 
MZKFHZKMHZMKFKHFZMZMHFKHHF 


Test II 
Key 
HFKZM 
45213 
FKMZHMKZHFKZMFHKFHMZMZFHK 
MZKFHZKMHZMKFKHFZMZMHFKHF 
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The testing was preceded by a thirty-second digit-writing period, 
during which each subject wrote the digits 1, 2, 3, 4, 5 (in order), with 
emphasis on speed. A thirty-second practice period of letter-digit 
substitution with the key A-9, U-6, L-7, R-8 served as preparation for 
the actual experimentation. 

Instructions were explicit, cautioning as to accuracy but urging the 
completion of as many substitutions as possible. Consecutive sub- 
stitution was required. 

The subjects in this study consisted of one hundred high-school 


sophomore girls, for whom Intelligence Quotients had been derived from 
Kuhlmann-Anderson Test results. 


RESULTS 


In interpreting results the Substitution score was taken as the total 
number of substitutions correctly made during the twelve one-minute 
working periods. The /nterference score embraced the total number of 
changes and errors made during the trials. Any number that was 
changed, erased, or incorrectly substituted was considered an 
interference. 

The reliability of the tests as measured by the correlation of odd- 
with-even trials is sufficiently high for us to assume that the instruments 
are consistent in the measurement of ability involved in letter-digit 
substitution. Substitutions on Test I yield a reliability coefficient 
(one hundred cases) of 0.89 + .013; substitutions on Test II, 0.95 
+ .006; interferences on Test I, 0.72 + .032; and interferences on Test 
II, 0.83 + .022. These coefficients are raised to 0.94; 0.97; 0.83; 0.91 
respectively when the Spearman-Brown formula is applied. 

The average number of substitutions completed on Test I was 
627.3 and on Test II, 599.6. The difference between substitutions on 
Tests I and II of —27.7 is hardly significant, the si being 1.98. 
Of the one hundred subjects, tweniy-nine improved their substitution 
scores on the second test showing an average gain of 47.5 substitutions 
during the twelve periods. Sixty-nine were apparently hindered, show- 
ing an average loss of 58.6 substitutions on Test II. 

The average number of interferences for the total group was 18.9 on 
Test I and 22.5 on Test II. The difference between the mean inter- 


D ie 
ferences on the two tests (3.6) yields a SDas of 1.8. Fifty-five of the 
diff. 


subjects showed an average increase in number of interferences on 





496 The Journal of Educational Psychology 


Test II of 10.3. Thirty-seven subjects evidenced less interference, 
averaging 5.9 fewer errors on Test II. 

The number of digits written during the thirty seconds allotted 
for that practice showed no relationship to 1Q ratings (0.037 + .067) 
but there was an indication that the abilities involved in writing digits 
and making letter-digit substitutions were slightly related, the product 
moment correlation coefficient being 0.36 + .058. 

Table I shows the average IQ’s of those subjects who gained or lost 
in substitutions and interferences on Test II. The twenty-nine cases 
of positive transfer with regard to substitutions had an average IQ of 
106.5 as compared with an average IQ of 111.8 of those who showed 
negative transfer. In interference scores the fifty-five cases who made 
more interferences on Test II had an average IQ of 112.7 while the 
thirty-seven subjects who seemed less disturbed had an average IQ of 
107.4. Indications seem to point toward greater interference on the 
part of the brighter subjects. 


TasB.LE I.—AveracE IQ RatineGs with Respect TO GAIN AND Loss IN NUMBER 
OF SUBSTITUTIONS AND INTERFERENCES ON TEstT II 


























No. of Average sD 
cases IQ 
Substitutions 
Gain on Test II (positive transfer).......... 29 106.5 11.7 
Loss on Test II (negative transfer).......... 69 111.8 12.1 
No change on Test IT... ... 2... cece eceees 2 110.0 6.3 
Interferences 
Gain on Test II (negative transfer)......... 55 112.7 12.2 
Loss on Test II (positive transfer).......... 37 107.4 12.0 
No change on Test IT.................000. 8 106.5 9.8 





Considering the individuals who showed negative transfer most 
definitely to be those who gained in interferences and, at the same time, 
lost in substitution score, and the ones who showed true positive trans- 
fer to be those who had fewer interferences and gained in substitution 
score on Test II, there were forty-two cases of negative transfer, the 
group having an average IQ of 113.9, as compared with fourteen cases of 
positive transfer showing an average IQ of 105.1. 
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TaBLeE IJ.—Averace IQ RatinGs with REsPEct TO GAIN AND Loss on Test II 
IN SUBSTITUTION AND INTERFERENCE Scores ComMBINED 








a Substitu- No. of | Average sD 
tions cases IQ 

NS in a ginwe.are% Loss (Negative transfer) 42 113.9 | 12.5 

ED hairs 3 5 ane ace Gain (Positive transfer) 14 105.1 13.3 




















The data were further analyzed by tabulating the positive or nega- 
tive transfer effects with respect to percentile rank of IQ’s. In Table 
III the results are epitomized rather forcefully. The brightest twenty- 
five per cent of the group shows the greatest interference as indicated 
both by loss in substitution score and gain in number of interferences. 
This group experienced approximately three and one-half times as 
great a decrease in substitutions as did the dullest twenty-five per cent. 
The Interference figures are less striking. All four groups showed some 
degree of negative effect. 


TaBLE III.—AvERAGE CHANGE IN SUBSTITUTION SCORES AND INTERFERENCES 
on Test II CoMpaRED BY QUARTERS OF IQ DisTRIBUTION 











Average raw change 
IQ ratings 
Substitutions | Interferences 
Highest twenty-five per cent.................. —51.8 6.0 
Second twenty-five per cent................... — 26.4 4.2 
Third twenty-five per cent................... —17.4 3.5 
Lowest twenty-five per cent................... —15.0 4.7 











Correlation coefficients obtained regarding change in substitutions 
and interference scores with respect to IQ ratings are not large and do 
not demonstrate the differences with justice to the data. The fact 
that extremes may affect a product moment coefficient so greatly 
seems to operate here to reduce the measure of relationship. Raw 
change in substitutions on Test II correlates —0.127 +.066 with IQ 
ratings. Raw change in interferences on Test II yields an r of —0.218 


+.064 with IQ. While neither of these correlations is statistically 
significant, both show an inclination on the part of the duller students 
to be less affected and the brighter ones more affected by the inter- 


Apparently the brighter 


ference of previously learned material. 
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children retain better and therefore have greater difficulty in breaking 
up former associations than do the duller and poorer retainers. 

Summarizing the results of the study, certain persistent tendencies 
are observed indicating a relationship between degree of brightness, as 
measured by a standard intelligence test, and susceptibility to increased 
interference of former learning with subsequent learning. This inter- 
pretation is based upon the group as a whole. Within the group large 
individual differences are evident, both in direction (positive or nega- 
tive) and extent. Differences would probably also exist in the same 
individual upon different transfer problems. 

The data are too inadequate for conclusions to be drawn. How- 
ever, contrary to definition and to the assumptions of many of the 
intelligence-testing group, our data would seem to point toward the 
possibility that negative transfer is, in some situations, more closely 
associated with higher intelligence than positive transfer. The expla- 
nation may lie in the suggestion that the more intelligent are better 
retainers and that their proficiency in retentiveness causes an inter- 
ference of previously learned materials with learning of similar material 
attempted later. 
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TEACHERS’ ATTITUDES AND CHILD BEHAVIOR 
PROBLEMS 


D. B. ELLIS AND L. W. MILLER 
University of Denver 


A comprehensive study, Children’s Behavior and Teachers’ Attitudes, 
was made by Wickman’ in 1928. The present study is based on Wick- 
man’s work using his schedule B-4 (p. 206) with modifications. 

This study was undertaken for the purpose of comparing the results 
secured by Wickman in 1928 with those secured under certain changed 
conditions from three hundred eighty-two junior and senior high-school 
teachers of Denver in 1935. Wickman’s results are based on five 
hundred eleven teachers from Cleveland, Newark, New York City, and 
three villages in New York and Minnesota. He also secured ratings 
from thirty mental hygienists from child guidance clinics in Cleveland, 
Philadelphia and Newark, N. J. 

A rating scale listing fifty types of problems in the behavior of 
children wasemployed. These were to be rated as to their seriousness. 
Wickman’s instructions (schedule B-4) asked the teachers to consider, 
‘How serious (or undesirable) is this behavior in any child?” In 
another place it was asked, ‘“‘To what extent does it make him a difficult 
child?”’ Each problem was rated on a scale varying from “of no conse- 
quence”’ to “‘an extremely grave problem.”’ In presenting the rating 
scale to mental hygienists Wickman’s directions were not the same as 
those given to the teachers. The mental hygienists were asked: “‘ What 
is your professional opinion of the seriousness or importance of this 
behavior when occurring in any school child with regard to its future 
effect in limiting his or her happiness, success, and general welfare after 
leaving school and on entering adult social and industrial life. In 
other words how much will the possession of this behavior by a child 
generally handicap him in his future adjustments as an adult.”’ In 
the judgment of the writers the teachers and mental hygienists direc- 
tions varied too greatly to permit a valid comparison. In presenting 
this schedule B to the Denver teachers our directions were essentially 
the same as those given to the mental hygienists. 

A quotation from Wickman’ will indicate his emphasis in instruc- 
tions to the teachers. He states, (pp. 92-93), ‘The real purpose of the 
scale was to measure the teachers’ customary habits of thinking about 
and reacting to the occurrence of troublesome forms of behavior in their 


501 








502 The Journal of Educational Psychology 


pupils. For this reason the rating scale was introduced to them with 
the statement that it represented an effort to secure necessary informa- 
tion in evaluating the seriousness of behavior problems in children, and 
the terminology employed in setting up the scale included such words 


_ as serious, undesirable, misfit, disturbing, problem child, maladjusted. 


Stress was laid on the degree of undesirableness of a particular behavior 
problem in a child and the amount of difficulty produced in coping with 
the problem, with the hope that, by so directing them, the teachers’ 
emotional reactions to the problems might be elicited. The assump- 
tion was that the degree to which teachers considered a particular form 
of behavior undesirable represented the energy they exerted toward the 
modification of such behavior. As will be seen later, this method also 
afforded an insight into the teachers’ requirements of classroom behavior 
and into their habits of treating the behavior symptoms of 
maladjustment. 

“One additional point of technique was employed to reduce the 
tendency to intellectualize or rationalize in making ratings of this kind. 
A time limit of thirty minutes was imposed for reading the directions and 
making the two hundred ratings required on the two scales (frequency 
of occurrence and seriousness) of the fifty problems as they occurred in 
boys and girls separately. Moreover, the teachers were urged to make 
their ratings as rapidly as possible. By securing their first immediate 
reactions to the problems without permitting much time for rationaliza- 
tion, it was hoped that their everyday responses would be elicited 
rather than their studied intellectual responses indicating what their 
attitudes ought to be.”’ 

Wickman, in his study, determined teachers’ attitudes toward 
children’s behavior by a method which recorded the teacher’s first 
immediate and rather hurried reaction to the problem. The teacher’s 
best professional judgment of the problem was certainly not obtained. 

However, in his questionnaire to mental hygienists, Wickman asked 
them to give their best professional judgment in rating the problems. 
In justifying the difference between the instructions given the teachers 
and to the mental hygienists Wickman’ states (p. 118): ‘‘ In educing the 
reactions of mental hygienists to the fifty problems of child behavior, 
it was desired to ascertain their purely intellectual and professional 
opinions freed as far as possible from their native, emotional responses 
to these problems. It will be recalled that in testing the teachers’ 
reactions, the inquiry was exactly the opposite, 7z.e., the task was to 
elicit the everyday responses to the problems freed as far as possible 
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from whatever rationalizations the teachers might make when subjected 
to a ‘test.’ The purpose on the one hand was to study the typical 
attitudes toward behavior that actually prevailed on the part of teach- 
ers, and on the other hand to establish the ideal, as it were, of what the 
attitudes should be for the healthy correction of behavior disorders and 
for the ultimate welfare of the individual child, so far as mental hygiene 
studies at present indicate. 

‘‘By imposing an intellectual control over the ratings made by the 
mental hygienists, we hoped to contrast as sharply as possible the 
intellectual with the more spontaneous reactions to behavior problems, 
in order to discern what changes in educational methods are indicated 
for the treatment of behavior disorders and for a fuller realization of 
that cherished goal, ‘educating children for life.’ Unfortunately, in 
ordering the experiment in this way, the results may appear to the 
disadvantage of the teaching profession and very much to the advantage 
of the mental hygienists.” 

It is proposed in this study to obtain the best professional judgment 
of the teachers by modifying somewhat Wickman’s technique. 

Wickman’s schedule B-4 (pp. 206-208) was modified to allow time for 
sound judgment on the problems. The questions to be answered in rating 
the problems were changed from ‘‘ How serious (or undesirable) is this 
behavior in any child, and to what extent does it make him a difficult 
child?”’ to ‘‘How much will the possession of this trait by a child 
handicap him in his future development and adjustments as an adult?” 
This change calls attention to the future seriousness of the problem. 
Further, teachers were cautioned to refrain from considering any trait 
from the standpoint of a problem in classroom management. 
Wickman’s directions tended to direct the thinking of his teachers 
toward classroom situations. 

The fifty behavior problems of Wickman’s Schedule B-4, with the 
graphic rating scale, were not changed. An additional page was used 
with spaces to check statements indicating sex, marital status, school 
position, type of school, teaching experience, recent college or university 
courses in Mental Hygiene, Character Education, and Guidance. (A 
copy of the questionnaire appears in the appendix.) 

A questionnaire was sent to each junior and senior high-school 
teacher in the Public Schools of Denver with the permission and coéper- 
ation of the superintendent of schools. Participation by the teacher 
was entirely voluntary. The distribution was so handled that it was 
impossible to determine the identity of any person returning a question- 





; 
; 


504 The Journal of Educational Psychology 


naire. (See first page of questionnaire in appendix.) No ratings by 
Denver teachers who had read Wickman’s study or who had heard it 
discussed are included in this report. 

In presenting the results secured, a table is given for the total 
Denver group, Wickman’s teachers and the mental hygienists. Space 
does not permit the presentation of tables of minor importance. A 
bound detailed report may be secured from the Library of the Univer- 
sity of Denver. Table I shows the nature and composition of the 
Denver group. 


TaBLE I.—NvuMBER oF TEACHERS WHO RETURNED QUESTIONNAIRES CLASSIFIED 
AccorDING To ScHoo., Sex, Maritrau Status, TEACHING EXPERIENCE AND 
EDUCATIONAL PREPARATION SINCE 1922 











Junior High | Senior High 
ss School School Total 
Classification 

Men | Women} Men | Women! Men | Women 
ee re ee 43 67 50 25 93 92 
iit oe eee nn dak wn wed 5 | 122 6 64 11 | 186 
Totals, married and single.......... 48 | 189 56 89 | 104| 278 
Under ten years’ experience......... 29 48 21 21 50 69 
Over ten years’ experience.......... 19 | 141 35 68 54 | 209 
Courses since 1922................. 35 | 116 34 39 69 | 155 
No courses since 1922.............. 13 73 22 50 35 | 123 























Wickman reports that his teachers and the mental hygienists did not 
agree. The rank order correlation for the two sets of ratings was —.08. 
In the present study the Denver teachers’ ratings correlated .49 with 
the ratings of Wickman’s hygienists. The Denver teachers’ ratings 
correlated .65 with those of Wickman’s teachers. 

The thirty mental hygienists of Wickman’s study considered with- 
drawing, recessive personality traits as most serious. 

A critical examination of Table II shows withdrawing, recessive 
personality traits scattered in rank-order arrangement as to seriousness 
from seventh place to thirty-second place for the Denver teachers. 
These traits, with their respective rank, are: 


7. Unhappy, Depressed ~ 24. Fearfulness ~ 

8. Unsocial, Withdrawing \ 26. Sensitiveness ~ 

10. Suggestible » 29. Overcritical of Others ~ 
14. Easily Discouraged“ 32. Suspiciousness ~ 

17. Resentful ~ 
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TaBLE IJ.—Rank OrpER ARRANGEMENT OF BEHAVIOR PROBLEMS OF CHILDREN 
By Groups as INDICATED 








Denver Mental Wickman’s 
Type of problem teachers hygienists teachers 
Ds tbc cbdhbnneewsaeede ts veekssbebecaane 1 13 1 
i EN cnccicdcocceaseetbeseenceos 2 25 2 
Sf ee earner 3 21 12 
inch Aceseenedukenan wees seeed een 4 23 5 
I, coc aco pecbccsesebbeaeeowsa 5 6 Ms 
EEC Ee Ene en, a 6 24 9 
eee darted ae boanweu 7 3 22 
i Ps, SIL, ono ccceviccceccccscceces 8 1 40 
SS a er re ee i) 41 3 
ick i wan bed 666 SSbeboweesbuaedesusa 10 8 28 
it Se ee eka cu eek etedaedeee 11 17 13 
12. Obscene notes, pictures, talk.................... 12 28 4 
13. Destroying school materials..................... 13 45 10 
EE eee 14 7 23 
EE eee mre ee eT eee 15 16 24 
kd dee ed eee den ekaeieee esse sews 16 37 7 
tt idc tne e aoe San basatnenckseceinkb aon 17 4 29 
i: ko kee bis aide bewaeneebeseeeeen 18 19 20 
6c ceca peau ewe eee eeieuaeeaa 19 31 27 
i ee ae Oe ae heel wie aba ea 20 42 11 
a ois eh a A ee ee ho 21 11 33 
i ciks tp echbkeseeneebbeenatséseeeesee 22 32 16 
Pi érivadeebbedasasdsdudabdaeddwenaaee 23 36 17 
i an. din aia a eee ashe aa aes ms one wae 24 5 36 
n+ ccceviges heeecbbcewesaddad nukes 25 38 25 
i. cn wd ede abscubehaameedeneeaoeas 26 10 48 
ids iiees ne hake denned ateendeniiad bis 27 22 6 
i  . «nae neeebeebeasesedeseuecnes 28 15 31 
es IID GO QUOI, oon ccc cc ccc ccccvvccecceces 29 9 45 
cick th aed due bee eeeeeres sanebae 30 12 35 
ed ah ad GW wa do bale eb eee e UN a au keen One 31 35 34 
FSF ET FECT E ETE EE TT PP CCE TTT CCETT 32 2 37 
Dc ccceceusotesecessedeuce 33 14 50 
Ec cccecakedcndwaaeeeeseeseteaesn 34 26 14 
6 eke eee eek aaNeed 35 20 32 
Ee 36 27 19 
i cncibveteteeneeedsheedus she ae 37 39 38 
hn. 1.4 tte tekihe deb bee ew audaneé eee 38 43 30 
ee reek a eee ailien wad win kane 39 34 26 
i cic 6 deo kiwehwbe es hha de bebe © OA eee 40 47 15 
i in aes one ad bew eens ene be esse 41 Ad 44 
ale ee a i ae oe ae eal 42 18 41 
ne ia eins chee Gahan ak emh hanno e 43 46 21 
i isos econ sanndh hen emebewe eee te dees 44 29 46 
ne éub sees eee Sanenesactee 45 33 42 
tee 5h6s na seddaae Oeebaeenenaseenees 46 49 18 
ee eel be od 2a take weldia mine we aon 47 30 39 
as i es te lea cee ea eeeine 48 4 49 
oe ee eee ember weeaees 49 48 43 
PT +: Jobecahcetes kebenesetedeus seuteds 50 50 47 
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Four of the personality traits appearing among the first ten ranked 
as most serious by mental hygienists also appear in the first ten con- 
sidered most serious by the teachers of this study. These four are 
“Cruelty and Bullying,” ‘‘Unhappy, Depressed,” ‘‘Unsocial, With- 
drawing,” and ‘‘Suggestible.’”’ This, with the fact that no withdrawing, 
recessive personality trait was ranked lower than thirty-two, would 
indicate that the teachers of this study are considerably aware of the 
seriousness of this type of behavior problem. 

It is interesting to note that ‘‘ Unsocialness”’ is placed eighth by the 
three hundred eighty-two teachers of this study and placed fortieth by 
the five hundred eleven teachers of Wickman’s study, a rank-order 
difference of thirty-two places. Likewise, ‘‘Unhappy, Depressed” 
placed seventh in this study appears twenty-second in Wickman’s, 
“‘Suggestible”’ placed eighth in this study is placed twenty-eighth by 
Wickman’s teachers. 

It might be well to suggest at this point that the wide differences 
between teachers’ ratings in this study and the teachers’ ratings in 
Wickman’s study may be due very largely to the difference in approach 
in securing the ratings. As pointed out above, Wickman asked the 
teachers to rate the problems in answer to these questions: ‘‘ How 
serious (or undesirable) is this behavior in any child?” ‘To what 
extent does it make him a difficult child?” A time limit of thirty 
minutes was set. No caution was made to refrain from considering 
the problems from the standpoint of classroom management. Thus 
the teachers’ immediate and first reactions to the problem were secured 
and not a critical evaluation or thoughtful judgment of the problem. 

The thirty mental hygienists considered dishonesties, cruelty, 
temper tantrums, and truancy as less serious than withdrawing, 
recessive personality traits. 

Referring to Table II we find problems of dishonesty, temper 
tantrums, and cruelty definitely considered as most serious by the 
Denver teachers. Strangely enough, truancy is not considered nearly 
so serious as the others. 

The next grouping of traits as to seriousness by the mental hygi- 
enists includes immoralities, violations of school work requirements, and 
extravagant behavior traits. Referring again to Denver teachers, 
Table II, we find problems of immorality considered as serious as the 
dishonesties. Violations of school work requirements are not con- 
sidered serious. Extravagant behavior traits such as domineering, 





ir 


b 








Teachers’ Attitudes and Child Behavior Problems 507 


sullenness, and interrupting are, on the whole, not considered very 
serious. 

The mental hygienists’ last grouping includes transgressions against 
authority and violation of orderliness in class. 

For the Denver teachers we find traits such as impertinence, 
impudence, and disobedience, which are transgressions against author- : 
ity, considered as rather serious. Problems of classroom orderliness - 
such as whispering, disorderliness, and inattention are considered least _S 
serious. 

From the foregoing, we can formulate the direction of three hundred 
eighty-two teachers’ reactions to the seriousness of behavior traits. \ 


[In the following material, where groups of characteristics or traits are 
classified under a single heading such as dishonesties, the classification as to 
seriousness was determined by summating ranks.] 

Most Serious.—Dishonesties, Immoralties, Cruelty, Temper Tantrums. 

Serious.—Withdrawing, Recessive personality and behavior traits, 
Transgressions against authority, Extravagant behavior traits. 


Least Serious.—Truancy, Violation of school work requirements, and 
Disorderliness in class. 


For comparison, the grouping of the traits as rated by the five 
hundred eleven teachers of Wickman’s study and the groupings as 
rated by the thirty mental hygienists are repeated below. 


SERIOUSNESS AS RATED BY FIVE HUNDRED ELEVEN TEACHERS IN WICKMAN 
(Pp. 115) 


Most Serious.—Immoralities, Dishonesties, Transgressions against 
authority. 

Serious.—Violations of classroom order and application to school tasks. 

Less Sertous.—Extravagant, Aggressive personality traits. 

Least Serious.—Withdrawing, Recessive personality traits. 


SERIOUSNESS AS RATED BY THE THIRTY MENTAL HYGIENISTS (P. 130) 


Most Serious.—Withdrawing, Recessive personality and behavior traits. 

Sertous.—Dishonesties, Cruelty, Temper Tantrums, Truancy. 

Less Serious.—Immoralities, Violations of school work requirements, 
Extravagant behavior traits. 


Least Serious.—Transgressions against authority, Violations of orderliness 
in class. 


From the foregoing it would seem that the significant difference 
between the ratings of the teachers in this study and those of Wickman’s 
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is the increased realization of the seriousness of the Withdrawing, 
Recessive personality traits. 

Reports of three similar studies show that teachers consider 
“Violations of general standards of morality” and ‘‘ Transgressions 
against authority” as the most serious types of behavior traits. The 
teachers in this study also consider those types as most serious. 

Perhaps there is some justification in this attitude. Bain points 
out in her conclusions (J:45) (1): ‘‘There is some occasion for caution 
lest the newer standards lose sight of social integrity. If children 
need not show consideration for ‘authority, either of man or the Deity,’ 
what are they to have in its place? In guidance of children, extremes 
of liberty or license are to be avoided as much as extremes of discipline, 
and the teacher must be ever on the alert to keep a whole balance of 
values, personal and social, in guiding the child.” 

Present standards of society impose requirements for certain types 
of behavior.and exact retribution from transgressors. Offenders who 
steal are in serious difficulty (if caught). The person who violates 
these standards of social conduct certainly is handicapped in his 
success in making adjustments to the social group. Such traits as 
impudence, impertinence, and temper outbursts are frowned on in adult 
society, and the person who habitually exhibits them is unpopular with 
his associates and finds difficulty in making happy adjustments in his 
contacts with society. 

While withdrawing, recessive personality traits may be serious from 
the standpoint of mental health, it is perhaps well to remember that the 
possession of these other types of undesirable traits may also seriously 
handicap an adult in making adjustments. 

The second phase of this study deals with differences in attitudes 
between the groups of teachers classified according to sex, marital 
status, type of school, teaching experience, educational preparation 
since 1922 in Mental Hygiene, Character Education, or Guidance. 
These further comparisons are made within the Denver group. Teach- 
ers who had taken courses in character education, guidance or mental 
hygiene (since 1922) gave almost the same ratings as those who had 
not taken such courses. Since courses entitled guidance and character 
education vary so greatly in content these probably should have been 
omitted. 

Some other results are briefly stated: 

(1) Women consistently rated problems as more serious than the 


men. 
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(2) Men with over ten years’ experience rated the problems as less 
serious than did those with under ten years’ experience. 

(3) Women with over ten years’ experience rated the problems as 
more serious than did those with under ten years’ experience. 

(4) No consistent differences were found between married and 
unmarried teachers. 

(5) No differences were found between junior and senior high-school 
teachers. 


(6) It also seems reasonable to conclude that questionnaire results 


are greatly influenced by the directions given and by the time allowed 
for the ratings. 


APPENDIX 
Denver Public Schools, Department of Research and Curriculum 
BEHAVIOR PROBLEMS 


Explanation of Purpose of Study 


Do not sign your name or indicate your school anywhere on this question- 
naire. 

The various types of behavior problems of children are constantly being 
encountered by both parents and teachers. Anything which will help us to 
properly meet these problems should be of value both to parents and teachers 
and to the children whose future development depends in part on how we 
treat their problems. Our present knowledge of child psychology has caused 
us to change some of our ideas of children’s behavior. In this preliminary 
survey we hope to find out which behavior problems most seriously affect the 
future development of the child. 

Many of the behavior traits of children are annoying to parents and 
teachers, but do not very seriously affect the child’s future development. For 
instance, whispering is annoying in the classroom, but probably does not 
seriously affect the child’s adult adjustments. On the other hand, stealing 
may become a serious problem in the child’s future development. 

In order to secure the free and unbiased judgment of each person who returns 
a questionnaire, no identifying marks of any kind have been placed on the blank. 
The distribution is so handled that it will be impossible to determine the identity 
of any person returning a questionnaire. As soon as you have finished marking 
the blank, place it in the attached envelope, seal it, and return it to the depart- 
ment of research and curriculum. 

This study is being made codéperatively by the department of research and 
curriculum and Mr. Douglas B. Ellis, Boys’ Adviser at Horace Mann Junior 
High School. Your coéperation is, of course, voluntary. If you will return 
this, if possible, not later than May 28, 1934, we shall appreciate your help. 
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Explanation and Directions for Marking 


A list of behavior problems obtained in a study by Professor E. K. Wick- 
man under the auspices of the Commonwealth Fund of New York is attached. 
Although these problems are all problems of classroom management, please 
consider them for the purposes of this study only in relation to the following 
question: How much will the possession of this trait by a child generally 
handicap him in his future development and adjustments as an adult? 

1. At the top of the rating scale are four statements indicating the various 
degrees of seriousness of the traits, least serious at the left, most serious at the 
right. 

2. After each trait there is a line on which you make your rating according 
to the seriousness of the item. 

3. Rate each trait by making a vertical stroke like this (/) at the appro- 
priate place on the line according to the caption at the top. 

4. You may make your rating at any point on the line, either on a divisional 
point or anywhere between. This will permit you to distinguish finely between 
the different behavior problems. 

5. Rate only the seriousness of the problem, not the frequency. 

6. Please do not consult with anyone in answering this questionnaire. 

7. The following example will illustrate: 

How much will the possession of this trait by a child generally handicap 
him in his future development and adjustment as an adult? 





Of only - 
— slight A serious — 
conse- problem g 
quence quence problem 





ia oe ie a as al 
I ahah ease hone ahaa 

















Please bear in mind these limitations as you rate the items: 

1. These traits may occur in almost any child at one time or another. In 
your ratings please interpret each separate trait as being manifested just 
frequently enough to make it a problem. For instance, if a pupil is tardy 
once in a semester, we could hardly consider him a tardiness problem, but if 
he were tardy on an average of once a week, we would consider him a problem. 

2. Any one of these traits when extended to the extreme may produce very 
serious difficulties in the child’s future. Confine your ratings to the usual 
development of the traits as in the example above. If the pupil were tardy 
every day of the semester, that would be an extreme case. For the purposes 
of this study we are not concerned with extreme cases of any one trait. 
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3. Refrain from considering any trait from the standpoint of a problem in 
classroom management. 


4. Rate each trait in answer to this question: How much will the possession 
of this trait, to the extent illustrated above, generally handicap him in his 
future development and adjustment as an adult? 
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THE PROBLEM OF PRINCIPAL COMPONENTS: 
DERIVATION OF HOTELLING’S METHOD FROM 
THURSTONE’S 


CHESTER E. KELLOGG 
McGill University 


In Professor Thurstone’s recent volume on factor analysis (The 
Vectors of Mind, University of Chicago Press, 1935) there is given, on 
pages 129-132, in the chapter on ‘‘The Principal Axes,” a critique 
of Hotelling’s method of ‘‘ Analysis of a Complex of Statistical Variables 
into Principal Components,” published in the September and October 
issues of The Journal of Educational Psychology for 1933. This method 
is referred to as a discussion of ‘‘ the special case of the method of princi- 
pal axes in which unity is recorded in the diagonals of” the correlation 
matrix. Comment follows on the failure of the method to recognize 
the existence of other than common factors—errors of measurement, 
sampling errors in the correlations, and specific factors. On page 131, 
it is stated that Hotelling’s iteration method—the practical method 
which Hotelling offered as a means of approximation to the exact 
solution—might, like the centroid method, be used to determine the 
minimum number of factors to reproduce the correlation matrix within 
the limits of error ‘‘if it could be modified so as to use communalities 
instead of unity in the diagonals.”’ 

A glance through Hotelling’s second section, which gives the 
derivation of his method, shows that he does neglect to take into 
account possible specific factors. He does consider errors of measure- 
ment, using unity in the diagonals with coefficients corrected for 
attenuation, or (cf. the close of the first section) raw correlations with 
reliabilities in the diagonals. Sampling is taken up in section 6; it is 
difficult to see how it could be incorporated in the method proper. 
Thurstone does not attempt todoso. Now, inspection of the notation 
of Hotelling’s equations (6), (15), and (16), and especially the step 
from (16) to (17), shows that he bases the self-correlations, which he 
records as unity for corrected coefficients, on common factors only. 
Accordingly, it is evident that, when specific factors exist, Hotelling’s 
proof holds only for the reduced matrix. So the only modification 
required to use communalities in the diagonals is to put them there. 
But the method derives the use of the correlation matrix in the char- 
acteristic equation from the assumption that the test-factor matrix 


does not vanish, 7.e., that there are as many independent factors as 
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tests. The question naturally arises whether this condition is necessary 
as well as sufficient ; whether the method is effective in cases with fewer 
factors. 

This question was first put to a practical test by a trial of Hotel- 
ling’s iteration method with the correlation matrix given as Table 5, 
on page 129 in Thurstone’s volume, and using the communalities given 
in Tables 2 and 4. The success of the test will be shown by the 
convergence of the working weights, and comparison of the final weights 
with those given by Thurstone. For the first axis, the initial weights 
used were roughly proportional to the sums of the columns of the cor- 
relation matrix. The successive working weights are tabulated here: 








TABLE [ 
, Sere 1 2 3 4 5 6 7 
Trial 

1 7 Ss a 4 8 3 9 
2 .75 1. — .28 .25 .73 5 BR. 
3 .74 .96 — .41 .07 .65 .60 - 
4 .74 .94 — .49 — 01 .55 64 8 
5 .73 .93 — .53 — .07 .52 .64 B 
6 .73 .93 — .56 —.1 51 .68 # 
7 .73 .92 — .58 — .12 .50 .69 1. 
8 .73 .92 — .59 — .13 49 7 ¥ 
i) .729 .919 — .594 | —.134 .487 .702 Ra 
10 .729 .919 — .596 | —.137 .484 .703 ie 
11 .7296 .9184 | — .5973| —.1384| .4838 .7043 Bs 


























The sums of the products in the Test 7 column are given next, for 


the last few trials, to show the approach to the first root of the char- 
acteristic equation: 2.8102; 2.8407; 2.8427; 2.8480; 2.8481. A closer 
approximation might have been secured by use of more decimals, but 
this did not appear worth the labor. So this value was divided by the 
sum of the squares of the last weights obtained, and the square root 


























TaBLe II 
SS eee ere 1j2!{!3 |] 4 eiait i 
actor root 
First factor weight......... .| .660] .831| — .540) — .125) .437| .637/ .904 2.8481 











Thurstone’s roots, listed on page 126, opus cit., are negative, because he adds the 
scalar matrix to the factor matrix to form his characteristic matrix, while Hotelling 
subtracts; the discrepancy is of no practical importance. 





514 The Journal of Educational Psychology 


taken to give the reducing factor, +0.9044. Multiplying, and setting 
down the results to the nearest thousandth, we have for comparison 
with Thurstone’s weights the data in Table II. 

The first factor correlations were formed, to four places, and the 
residuals recorded. First trial weights were chosen more closely 
this time, by taking values nearly proportional to the square roots of 
the remaining communalities. The series of second factor working 
weights are tabulated next: 








TaBLeE III 
ne 1 2 3 4 5 6 7 
Trial 

1 17 .36 .90 a .88 —.5 16 
2 15 .33 .83 :. 77 — .42 .135 
3 .155 .343 .822 a .772 — .437 .139 
4 .156 .342 .828 ® .763 — .437 .140 
5 .1559 .3423 .8284 ® .7627 | —.4372| .1398 
6 .1559 .3433 .8284 g .7652 | —.4378) .1398 


























The reducing factor was found to be 0.7706, and multiplication gave the 
following second factor weights: 


TABLE IV 





Second 


factor root 

















Second factor weight. .| .1211) .2645| .6384| .7706| .5897| — .3374|.1078) 1.55878 

















Formation of the second factor correlations, and subtraction from 
the first factor residuals, gave the following distribution of absolute 
values of the second factor residuals, in ten-thousandths: 








TABLE V 
Residual (unit 0.0001)....... 07;11;2/;3),4)5)}6/)7/8)]9]...{12 |13 
PE, oo uknieoreen aun 31/8|/;6),4/)3)7)]11|0/3]2]...)}1)1 


























Median residual: Five ten-thousandths. 


Hotelling’s method having been shown to succeed in a case with 
fewer factors than tests, it is of interest to discover a proof that this is 
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not an accident, but can be expected in general. A study of the 
make-up of the two characteristic matrices gives a clue. Thurstone 
has already shown, as his fundamental factor theorem, that ‘‘The 
product of the factorial matrix and its transpose is the reduced cor- 
relation matrix”; or symbolically, “FF’ = Rk.” (Opus cit., page 70.) 
Inspection of the matrix of his characteristic equation, which may 
appropriately be denoted by 7, shows that the factorial matrix pre- 
multiplied by its transpose is the matrix of the characteristic equation 
in factor terms; or F’F = T. We require then a proof that two sym- 
metrical matrices formed in this way will give the same characteristic 
equations. Stated formally, the theorem is: Given two symmetrical 
matrices, # and T, with constant elements, and such that R = FF’, and 
T = F’F, then the characteristic matrices R — kI and T — kI will give 
the same characteristic equation. For the case of as many factors as 
tests, so that the determinant of the factorial matrix does not vanish, 
the theorem is covered by the converse, which Bocher says is obvious 
and so does not prove, of Theorem 3 in Chapter X XI of his Introduction 
to Higher Algebra, which reads: “‘ If a; and a2 are two matrices independ- 
ent of A, a necessary and sufficient condition that a non-singular matrix 
p exist such that ag = pa,p~' is that the characteristic matrices A, and 
A, of a; and a, have the same invariant factors—or, if we prefer, the 
same elementary divisors.” But for this special case, the validity of 
our theorem is implied by the fact with which we began, that Thur- 
stone’s method, which holds for any number of factors up to and includ- 
ing the number of tests, and Hotelling’s, which he proves only for the 
case of equality, both give the principal axes, which are defined as those 
which maximize the the contributions of common factors to the vari- 
ances. So we are especially concerned with the cases of inequality in 
the number of tests and factors (in terms of the theorem from Bécher, 
p singular), which Bécher does not mention. Quite likely our general 
theorem is also obvious to the professional mathematician, but, in any 
case, it will do no harm, and may help to clarify the problem for other 
psychologists, as.it has for the writer, to return to first principles, and 
develop the proof by use of the complete notation of factor analysis. 
(Although our theorem is stated only for symmetrical matrices, with 
which alone we are concerned, this restriction is not necessary. The 
more general theorem, that while matrix multiplication is not com- 
mutative with respect to matrices as such, it is so with respect to the 
elementary divisors of the resultant characteristic matrices, is valid, but 
less convenient to treat, as the expansions give more terms.) 
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We begin with the case of three tests and two common factors, 


and set down the four matrices: re: 
r 
TABLE VI ae 
PF F’ = 
a}; aAie2 Qi; G2 43; thi 
Qo: Ar Qi2 G22 Az2 Ts 
Q31 G32 
R let 
Q1;7 + a1,? 11021 + 12022 31031 + Q12032 be 
G2 Ai; + A220} Qo? + 22? Q21031 + G22032 
3141) + G32012 31021 + 32022 G32 + G32? 
T 
a;;? + a2;? + a3;? 11012 + A21d22 + A3132 
1201) + 2202) + 32031 Q12? + 22? + 32? 


Assuming linear independence, all four matrices are of rank two. 
So the determinant of R must vanish. Equating coefficients of the 
characteristic equations, the sum of the two-rowed principal minors 
of R must equal the determinant of 7’, and the sums of the elements 
of the principal diagonals of R and of T must be equal. That the latter 
holds good is plain at a glance, the first diagonal element of 7’ being the 
sum of the first items in the diagonal elements of R, the second element 
of T the sum of the second items in Rk. The two-rowed minors of Rk 


and the determinant of 7’ will be expanded to test their relationship: dia 
Spe 
TaBLe VII sec 


R,.2 = A11722? + G122G21? — 2011012021022 


Rijs = 11732? + 122a31? — 2011012031032 yee 
Ros = G217d32? + G227A31? — 2421022031032 onl 
T’ 12,3 = @117@22? + Q127Gei1? + 117G32? + 127a31? a tl 
+ 217d32? + G227a31? — 2411012021022 — 

2011012031032 — 2421022031032 tha 
; ; ee by: 

The minors sum to the determinant, as required. Examination of y 
, dias 
the structure of the matrices and the process of expansion shows clearly bef 
that the inclusion of additional tests in the system will simply add of t 
more minors of the same type to R and more items to correspond in the pat 


expansion of 7’. Four tests, still with two factors, will give three 
additional minors: 


TaBLeE VIII 
Riya = 11742” + G127Q41? — 2011012041042 
Ro.4 = 21742? + Go2741? — 2Za21A220 41042 
R34 = 31742? + 32741? — 2031032041042 
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In the expansion of T, a glance shows that the products of the cor- 
responding items in the diagonal elements cancel out, while the cross 
products remain. Supplying the new items in the elements of 7 as 
set down in Table VI for three tests, it is obvious that the new items in 
the expansion will be exactly those given by the three new minors of 
Table VIII. 

Turning now to the results of a change in the number of factors, 
let us first drop the second factor (Spearman’s case). The four matrices 
become: 


TaBLe IX 

F F’ 
ai Qs Gai G31 Ga 
aa 
a31 
a1 

R 
ai;? Q@1142, = G@jy103) A11041 
G214;,; G2? 42143; 421041 
43141, 43102; ai? a31:041 
Q41011 41021 A413, 94s? 

T 


A112 + G21? + agi? + a4? 


T thus has only one element, which is the sum of the principal 
diagonal elements of R. The 2-rowed minors of R all vanish— 
Spearman’s tetrad equations. We have found that inclusion of a 
second factor led to two matrices R and T based upon different arrange- 
ments of the same ultimate items, but that in the process of expansion, 
only the identical products from the two sources survived. A test with 
a third factor should give further insight. 

R and T for four tests and three factors take up so much space 
that they cannot well be reproduced here. They are easily set down 
by analogy with the examples of Table VI. The sums of the principal 
diagonals are found to correspond in the same systematic way as 
before. Turning to the next coefficient, which requires equality 
of the sums of the two-rowed principal minors of R and of 7’, the first 
such minor of each determinant will be given, to indicate their form: 


TABLE X 
Ri.2 = 1:2G22? + 11223? + i221? + G122G23? + 13221? + 
Qi3"d22? — 2011021012022 — 2011021013823 — 2012022013023 
T 1,2 = G112G29? + @112@32? + G112G42? + G212Q12? + G212a32? + 
21742? + 31212? + 31222? + 31742? + G4i2Q12? + 
41722? + 412032? — 2011012021022 — 2011012031032 — 
2011012041042 — 2021022031032 — 221220 41G42 — 2031032041042 
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Five more minors of the same form for R, and two more for T, giving 
a grand total of fifty-four items in each case, may be listed by making 
the proper substitutions of subscripts, in view of thesymmetry. Inthe 
minors from R, the substitutions will affect only the first subscripts; 
in those from 7’, the second. Order of multiplication within the 
items of the expansions being indifferent, it is clear that only items of 
the same form remain after cancellation, and that all the permissible 
combinations remain in each case, so the sums are identical, as required. 
We come finally to the coefficient of the absolute term, comparing the 
sum of the three-rowed minors of R with the determinant of 7. As 
expansion of the four principal minors of R requires five hundred 
forty-eight multiplications of the items composing the elements, and 
that of the determinant of 7 requires three hundred eighty-four, it 
would be a rather tedious affair. Fortunately, the symmetry of the 
matrices, and their composition from the same ultimate items, makes 
this unnecessary. Inspection shows that, as before, all the differing 
products formed in the process of expansion cancel out, while any 
product which remains after the cancellation in the expansion of 2 can 
be sound also in the expansion of 7’, and vice versa. 

It seems therefore justifiable to conclude, without at present 
attempting a formal statement in the phraseology of ‘‘mathematical 
induction,” that our theorem holds good when there are fewer factors 
than tests. 

In the reverse case, with fewer tests than factors, the symmetry 
shows that the relationship between the matrices would be similar, but 
would yield a transformation in the reverse sense, from known factors 
to a possible series of statistically independent tests, showing how many 
such tests might be required, and the proper weights, to assess the 
abilities of individuals, if, on independent grounds, fundamental 
abilities had been found to be not orthogonal, but related to each other 
functionally, though genetically separable. Much more work along 
the lines suggested in Thurstone’s later chapters must be done before 
this reverse case can be expected to have any practical value. 

The reader will probably find it interesting to round off the present 
approach to our problem by comparing the matrices R and T of low 
order for equal numbers of tests and factors. Identity of the expan- 
sions is easily verified. 

Nothing in our discussion need be taken to imply that the rank of 
R and T is as great as the order of J. The only requirement is that the 
elements of R be such as would result, within the limits of error, from 
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the same factor matrix which, pre-multiplied by its transpose, gave T’. 
If the actual rank is less than the order of T’,, zero roots to the appro- 
priate number will appear in the characteristic equation. In ordinary 
practice, in working from RF stated in terms of ordinary correlation 
coefficients, there is of course no preliminary assumption as to the 
numbers of factors involved, except that it will not be greater than the 
order of R. 

We conclude, then, that Hotelling’s method of obtaining the 
principal components implies the use of the reduced matrix R (the 
correlation matrix with communalities in the principal diagonal), 
and will give the same results as Thurstone’s method of rotating 
a system of axes otherwise obtained. The method may therefore be 
linked up with Thurstone’s system, as an alternative to the centroid 
method of discovering the minimum number of independent factors 
required to describe the data—an alternative much easier than Thur- 
stone’s own iterative method, given in his Theory of Multiple Factors, 
Ann Arbor, 1933. In fact, Hotelling’s iterative method amounts to an 
abridgement of Thurstone’s comparable to the saving involved in 
using F in the characteristic equation rather than 7. The methods 
having thus been brought together, Thurstone’s discussion will natur- 
ally be further supplemented by the later sections of Hotelling’s work. 


It is to be hoped that this unification of their techniques may facilitate 
progress in research. 


ADDENDUM 


The comparison of the Thurstone and Hotelling iterative methods 
in the last paragraph of this article was rather impressionistic. A 
rereading of Thurstone’s pamphlet, after an interval of five months, 
has led to an interesting discovery. It is true that the derivation 
by the method of least squares used by Thurstone, and dealing 
explicitly with the factor weights, takes up more space and seems more 
involved than Hotelling’s operations directly with the correlation 
matrix. Itisalso true that the final technique arrived at by Thurstone 
includes the operations used by Hotelling, with a number of additional 
calculations. But a more careful study shows that these extra steps 
are due, not to the least squares approach as such, but to an assump- 
tion on Thurstone’s part which is in conflict with his own usual pro- 
cedures. The essence of the problem is to find a set of factor weights 
that, when used to estimate the correlations, will minimize the squares 
of the residuals. Table 2, on page 32 of Thurstone’s pamphlet cited 
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above, represents the approximation in factor terms, and involves 
the assumption that, as far as the first factor is concerned, the self 
correlation ras = a:‘a = a?, so that his first column sum after multi- 
plication becomes aZk?. To this there is no objection. But he con- 
tinues; ‘‘If we do likewise for Table 1” (which represents the actual 
operation on the correlation matrix), ‘‘we have a- a? +6: rat 
C* Tac td* fag = a? + Dk- ry”? That is, the true diagonal values 
being unknown, he supplies the same value as given in the approxima- 
tion matrix. This amounts to an assumption that one factor is suf- 
ficient, or an underestimation of the communalities. A test with the 
problem treated in this article yields, as is to be expected, factors 
weights that are too low. In Thurstone’s later pamphlet giving the 
centroid method (A Simplified Multiple Factor Method, University of 
Chicago Bookstore, 1933), he recommends using the highest coefficient 
of each column of the correlation matrix as an approximation to the 
true communalities. If he had followed this plan with the least 
squares iteration method, the resulting technique would have been 
identical with Hotelling’s (using communalities in the diagonal), as the 
additional steps, used in getting from one set of trial weights to the 
next, are required by the use of the cubes of the weights in the diagonal 
cells. So only a failure to conform to his own teachings prevented 
Thurstone’s discovery of the technique since given us by Hotelling. 








RELATIONSHIPS BETWEEN CONSTANCY OF 
EXPRESSED PREFERENCES AND CERTAIN 
OTHER FACTORS! 


JACK W. DUNLAP 
Fordham University 


It has been demonstrated in an earlier paper by Dunlap? that there 
is a positive correlation of about .50 between expressed preferences or 
interests for a particular school subject and success in that subject. 
Therefore, it is possible to use the scores from such an interest blank 
for predicting future success, providing that the interests of the individ- 
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Fig. 1.—The distribution of the constancy indexes of two hundred fourteen items, 


expressed as the percentage of students who marked the item identically on two separate 
administrations of the Preference Blank. M = 54.3;¢0 = 16.1. 





ual are relatively permanent. Thus the value of such instruments for 
prediction depend on the constancy of the responses to the test items by 
the individuals. This paper deals with the results of an investigation 
(1) to determine the constancy of individuals’ responses to particular 
items, (2) to see if responses to items in certain subject-matter fields are 
more constant than those of other fields, (3) to determine if individuals 
are equally constant in all subjects and (4) to determine the relation- 





1 Presented before the New York Branch of American Psychological Associa- 
tion, April 11, 1936. : 
2 Dunlap, Jack W.: ‘‘Preferences as Indicators of Specific Academic Achieve- 
ment.” Jour. Educ. Psychol., Vol. XXVI, 1935, pp. 411-415. 
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ships between a measure of constancy for an individual and other 
variables. 

One hundred and forty-six seventh-grade students marked the 
Dunlap Preference Blank on two occasions separated by an interval 
of ten months. The percentage of individuals marking each item 
identically on the two occasions was determined and this percentage 
taken as the constancy index for the item. The distribution of the 
constancy values for the two hundred fourteen items in the blank is 
shown in Fig.1. Theitems vary, in terms of the constancy index, from 
nineteen per cent to ninety-two per cent, with a mean index value of 
54.3 per cent and a standard deviation of 16.1. 


TABLE I.—MEAnN ITEM CONSTANCY BY SUBJECT-MATTER FIELD, THE CORRESPOND- 
ING STANDARD DEVIATIONS, THE STANDARD ERRORS OF THE MEANS AND THE 
DIFFERENCES BETWEEN THE MEANS. ASTERISKS INDICATE DIFFER- 
ENCES THAT EQuaL OR EXCEED 204 











Arith- | Geog- | Gram-| His- | Hygi- | Litera- — 
ae oes metic | raphy mar tory ene ture Sat 
esta 
1 2 3 4 5 6 7 
1. Arithmetic....... 31/46.5) 7.5)1.3) ..... 7 Oe oe Gaeoke * * * 
2. Geography...... 38/54.8) 8.2)1.3) —8.3 | ...... ” “SD cacebel Gauaee * * 
3. Grammar........ 22/43 .2)13.7|2.9 3.3 Se wuaeas * * * * 
4. History......... 34/50.9/11.8|2.0| —4.4 ae @8° a atenaul seweed * * 
5. Hygiene......... 29/53 .8)11.1/2.1| —7.3 1.0) —10.6; — 2.9) ...... * * 
6. Literature....... 34/64.1)11.4/2.0|) —7.6 | — 9.3} —20.9) —13.2) —10.3 
7. General interests .|25|65.4|/13.4/2.7| —8.9 | —10.6) —22.2) —14.5) —11.6) —1.3 






































Each item in the blank refers to a particular subject-matter field; for 
example, the item Aesop’s Fables is classified as literature. Items were 
classified in seven subject-matter fields—arithmetic, geography, 
grammar, history, hygiene or physiology, literature, and general 
interests. Items classified as general interests were those items that 
had differentiated between successful and non-successful students as 
determined by the total score on an achievement test battery. The 
mean item constancy and the standard deviation was determined for 
each field. The distributions of the items for the various classifica- 
tions, with the exception of those items classified as general interest, 
closely followed the normal curve. 

The mean item constancy, the standard deviations, the differences 
between the means, and the significant differences are shown in Table I. 
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The mean item constancy varies from 65.4 for general interests to 
43.2 for grammar (see column 2). The subjects ranked in descending 
order as to constancy are general interest, literature, geography, 
hygiene, history, arithmetic, and grammar. Items classified as general 
interests or as literature not only elicit much more constant responses 
than those of the other subject-matter fields studied, but the differences 
between these subjects and the other fields are statistically significant. 
Items classified as grammar are, as a whole, lower in constancy than 
the other fields, and in five of six instances, the difference is statistically 
significant. The rank of hygiene is too high, due to the fact that the 
items were unknown to most children. Therefore, although the 
constancy index indicates constancy of response, it does not indicate 
constancy of interest. 

The constancy index was determined for each individual by divid- 
ing the number of items he had marked identically on the two trials 
by the total number of items, 7.e., two hundred fourteen. These 
constancy indexes varied from sixteen to seventy-four with a mean of 
54.6 and a standard deviation of 10.5, with the form of the distribution 
closely approaching that of a normal frequency curve. Later in this 
study the relationship between constancy and intelligence will be 
reported. No data were available to determine if the extremely low 
deviates were also emotionally unstable. 

Next the number of identical responses on the two blanks, for each 
subject-matter field, was determined for each student. The ratio of 
the number of identical responses within a particular field to the total 
number of items in that field furnished a measure of constancy of the 
individual for that field. The variability from subject field to sub- 
ject field was determined for each student by computing the standard 
deviation of his constancy indexes. The distribution of these standard 
deviations show that certain individuals are more than ten times as 
variable from subject to subject than other individuals, as measured 
in terms of the magnitude of their standard deviations. Thus there is 
both inter- and intra-individual variability. That is, individuals differ 
from each other in the constancy of their preference, and they also vary 
in their constancy of interests from subject field to subject field. 

The rest of this report deals with the inter-relations of individuals’ 
constancy by subject-matter field, the correlations between subject 
constancy and achievement in the same field as measured by the 
Metropolitan Achievement test and the relation between constancy 
and intelligence as measured by the Terman Group Test of Mental 
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Ability. Complete data were available on only six fields as the 
achievement test used did not contain a measure of hygiene. 

The intercorrelations of the several constancy measures varies from 
.17 to .65 with corresponding standard errors of .08 and .05. The 
intercorrelations between geography, history, and literature are the 
highest. Constancy in these groups of items is more highly correlated 


TaBLE II.—INTERCORRELATIONS BETWEEN THE SEVERAL CONSTANCY INDEXEs, 
AND SussEcT CONSTANCY WITH ACHIEVEMENT (COLUMN 7) AND CONSTANCY 
WITH INTELLIGENCE (CoLuUMN 8). WN = 146 








1 2 3 4 5 6 7 8 
R. MORONS... oc cc cccccccces .. | .89 | .80 | .24] .20] .17] .12] .06 
REPT COCR .. | .-. | 41 | .85 | .89] .26] .82 1] .22 
3. Geography................ .. |. |... | 65] .56] .40] .29] .24 
oh a oe oie WO ies aide ea ie Beas Eiaual® <o6 § a 1 eT ae ae 
5. Literature Ee pe wee © Kaw 8 koe O. oun Eee ee 
6. General Interests...........] .. |... |... | oo | ..e | eos | OO] 28 





























with constancy of general interests than to constancy of preferences for 
the fields of arithmetic and grammar. 

The correlations between constancy of response to a subject-matter 
field and achievement in that field are all positive as shown in column 
7 of Table II, and in each case greater than the corresponding correla- 


TasLe III.—TuHe Mean Constancy By SEX FOR THE SEVERAL SUBJECTS, 
TOGETHER WITH THE STANDARD DEVIATIONS, THE SEX DIFFERENCES, AND 
THE CriTicaL Ratios. Sixty-NINE GirRLS AND SEVENTY-SEVEN Boys 








: : : General 

Arithmetic| Geography| Grammar} History | Literature aiainatain 
M, 50.7 51.1 44.0 46.3 62.1 66.0 
M>, 42.2 57.5 41.0 54.4 63.9 64.0 
Diff. 8.5 — 6.4 3.0 — 8.1 1.8 2.0 
oy 12.3 14.9 14.8 13.7 12.3 13.2 
a 16.1 16.2 12.2 18.0 15.9 12.8 
A/ea 3.6 2.5 1.3 3.1 .8 9 























tion of subject constancy with intelligence (column 8). Thus the 
constancy of attitude toward a subject is more closely related to the 
individual’s success in that subject than to his intellectual ability. 
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The measure of individual variability discussed earlier, 7.e., the 
standard deviation of a student’s constancy indexes, when correlated 
with intelligence gave .07, indicating that individuals of either high or 
low intelligence are likely to vary radically in their constancy of 
interests from subject to subject. 

Finally the question of sex differences in constancy of response to 
the various subjects was investigated with the results shown in 
Table III. In four of the six fields girls exhibit a greater constancy of 
expressed interest, but only in the case of arithmetic is the difference 
statistically significant. The greater constancy of the girls in arith- 
metic is the more interesting when it is recalled that in achievement in 
arithmetic boys usually surpass girls. The boys, however, were more 
constant in their responses to geography and literature items than 


were the girls and the differences between the sexes are statistically 
reliable. 


SUMMARIZING 


1. One hundred forty-six seventh-grade students marked the 
Dunlap Academic Preference Blank on two occasions separated by an 
interval of ten months. 

2. The percentage of identical responses to each item was deter- 
mined. They varied from nineteen per cent to ninety-two per cent 
with a mean constancy of 54.3 per cent. 

3. Items were grouped according to subject-matter field and the 
constancy for each subject field was determined. Items classified\, 
as general interests and literature were distinctly superior to items in } 
the field of geography, hygiene, history, arithmetic, and grammar in 
the order named. The grammar items as a group exhibited the least 
constancy. However, highly constant items were found in each field, 
indicating that batteries of items eliciting constant responses can 
be built up for each field. 

4. Individuals varied from sixteen per cent to seventy-two per cent 
of identical responses for the entire blank with a mean constancy of 
54.6 per cent. 

5. The variation in constancy of response from field to field was 
determined for each student and it was found that some students varied 


ten times as much as others, 7.e., the standard deviation of their con- 
stancy indexes, varied from 2.4 to 25. 
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6. The intercorrelations of constancy indexes were all positive and 
varied from .17 to .65. The relationships between geography, history, 
and literature were the highest. 

7. Constancy in all cases was positively correlated with achieve- 
ment and intelligence, the relationships with achievement were the 
greatest in each case. 

8. Girls were more constant in their attitude towards arithmetic | 
than boys, but the opposite is true with regard to geography and , 
history. Differences were found in favor of the girls for the other , 
subject fields but they were not statistically reliable. | 
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A NEW INTERESTS TEST* 


G. R. GILES 
Vocational Guidance Officer, Education Department, Victoria, Australia 


The general method of interest testing consists of the presentation 
of a series of verbal stimuli, and the scoring of responses on an occupa- 
tional scale, agreement between the subject’s score and the occupa- 
tional norm being taken as indicative of an interest in, and possibly 
suitability for, that particular occupation. 

Where the degree of correspondence of a subject’s interests with 
those of a specific occupational group is required, this method has its 
advantages. For the investigation of occupational interests of boys 
and girls such a procedure is less satisfactory. Asounder device for the 
examination of occupational interests of adolescents is to test for a 
group of interests and to employ more definite stimuli than are found in 
existing scales. 

The utilization of pictures presenting specific occupational situations 
enables a test to be constructed which may be scored for interests in 
different fields or groups—these of course, depending on the details of 
the construction of the test. One such test, intended to determine a 
boy’s interest in ‘‘manual”’ occupations, ‘‘mental’’ occupations, or 
“social”? occupations has been developed and has yielded promising 
results. 


In the selection of the pictures four criteria had to be satisfied: 


(I) The picture must show some activity which might give expression 
to an interest in work with things (‘‘manual’’), work with papers (‘‘mental’’) 
or work with people (social). 


(II) The picture must depict some situation likely to rouse a specific 
emotional attitude. 


(III) The picture must represent some occupation for which children have 
expressed 4 preference. 


(IY) The picture must represent some occupation available in Victoria. 


A series of thirty photographs, each of which satisfied the above 


criteria and showing men at work, was collected and mounted on a 
large sheet. 





* The work reported herein is part of a larger enquiry into the use of aptitude 


tests in educational guidance financed by a grant from the Australian Council for 
Educational Research. 
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The test was given to a group of two hundred fifty-six boys aged 
ten years six months to twelve years six months in Grade VI in eight 
Melbourne primary schools. The boys were taken in groups of 
approximately twenty and asked to look at the first picture in the series, 
and to answer the question: ‘‘Would you like to be the man in the 
picture?”’ Each child had been supplied with a sheet containing the 
numbers of the pictures, opposite each of which the symbols “ Yes,” 
“No,” and ‘‘?,” and space for recording the reason for the answer, had 
been placed. For an affirmative response, the child had to draw a 
circle around ‘“‘ Yes’”’ on the answer sheet. For a negative reply, the 
circle was to be drawn around “No.” If the child did not mind either 
way, or did not know what the picture represented, a circle had to be 
drawn around the question mark. The reason for the child’s response 
was then sought. 

Scoring was carried through by giving the rating +1 for each like, 
—1 for each dislike, and 0 for each ‘‘did not know”’ or “‘did not mind.” 
As pictures of the three types had been scattered through the series in 
an indiscriminate order, it was felt that the possible sources of error due 
to a child marking all likes or all dislikes or adopting any other scheme 
of marking other than the true (at the time) expression of his thoughts 
would by this means be eliminated. 

It was anticipated that this means of testing would give a more 
reliable indication of vocational interest than could be secured by asking 
the question: “What would you like to be when you grow up?” or 
by using linguistic interest sheets. 

The results justified the expectations. There were differences in 
the reactions to the pictures and a satisfactory scatter of scores, these 
approximating to the normal curve. Though a number of cases 
received zero rating in each test, the ratings were not the same for each 
type of interest, and the distribution of the ranks proved sufficient to 
provide a general indication of the strength of interest in the particular 
type of occupation represented. 

The test was given to one hundred forty-nine boys of the group after 
the lapse of a period of twelve months and information obtained as to 
(a) the permanency of occupational preferences and (6b) the permanency 
of interests in manual, mental, or social occupations. 

The first preferences of this group were mainly for occupations of a 
‘‘Manual”’ type, eighty-seven, or 59 per cent of the first preferences 
being allocated to this kind of career. Thirty-one or 21 per cent wished 
to enter a ‘‘mental”’ occupation, and twenty or 13 per cent desired a 
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‘social’? occupation, while eleven, or 7 per cent omitted to indicate 
any preference. 


PERMANENCY OF OCCUPATIONAL PREFERENCES 


Of the one hundred forty-nine boys, ninety stated their first pref- 
erences to be the same on the two occasions. Those originally desiring 
a “manual” or a ‘‘mental’’ occupation were most stable in their 
choices, 68 per cent of both groups having the same first preference 
after the passing of the twelve months. Twenty-two per cent of those 
preferring “‘manual”’ occupations changed from one occupation to 
another in the same group, 7.e., from one occupation essentially manual 
in type to another of the same classification, e.g., from “‘plumber”’ to 
“electrician,” etc. The remaining 10 per cent desired an occupation 
in one of the other groups, e.g. from “‘engineer”’ to ‘grocer,’ etc. The 
32 per cent of boys originally desiring a ‘‘mental”’ occupation, who 
changed their minds, all desired an occupation outside this group at the 
second enquiry. 

In those preferring ‘“‘social’’ occupations, thirteen or 65 per cent 
gave the same preference at the end of the year; 20 per cent had changed 
to another occupation in the same group; and 15 per cent had changed 
to an occupation outside the group. 

Of the fifty-nine cases in which changes had occurred, it was found 
that the changes were within the group in twenty-six cases and that 
thirty-three had changed to occupations outside the group. Eleven 
of the latter were from ‘“‘no choice” to a definite preference. 

A radical change in occupational preference occurred in only twenty- 
two cases. In seven of these, an examination of the history of the boys 
indicated that new school experiences were probably contributory 
factors; in one case the change was due to pressure from the parent, and 
in another it was attributed to an extending occupational horizon. 
In the remaining thirteen instances, as far as can be seen, the school 
influence had little, if any effect. 

Summing up the position in regard to occupational preferences, in 
65 per cent of boys there was permanency of interest over a period of 
twelve months. When changes within the group are disregarded, the 
percentages of permanency of interest in a specific occupational group— 
manual, mental, or social—were Manual, 90 per cent, Mental, 68 per 
cent and Social 85 per cent. 

Though such high indications of continued interest in the same 
broad classification of occupations were revealed, the information 
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at present obtained by considering occupational preferences does not 
enable one to say which of the cases are likely to be those whose 
preference will remain constant, and therefore whose desire should be 
definitely considered. More reliable indications on this point were, 
however, obtained by considering the permanency of interest scores. 


PERMANENCY OF INTEREST SCORES 


Correlations between the raw scores on the two administrations of 
the tests were calculated. These assume equivalent distribution of 
scores and equal validity of response to each item, which was not the 
case. It was found that certain items were more reliable indicators of 
interest in one or other of the groups at Grade VI level than at Grade 
VII standard. Because of this source of error, it was decided to divide 
the scores in each test into five comparable groups: A = first 10 per 
cent, B = next 20 per cent, C = middle 40 per cent, D = next twenty 
per cent, and E = lowest 10 per cent, and determine the relation 
between these scores. 


CORRELATIONS BETWEEN INTEREST SCORES (PICTURE TEST) 


‘‘Manual”’ Score First Test and ‘‘ Manual’? Score Second Test (twelve 
months later) 


r= +.56 N = 135 
‘“Mental”’ Score First Test and ‘“‘Mental’’ Score Second Test (twelve 
months later) 
r= +41 N = 138 
‘“‘Social’”’ Score First Test and ‘‘Social’’ Score Second Test (twelve months 
later) 
r= +.45 N = 143 


Percentage of Cases with Same Numerical Score. 


““Manual”’ = 20 per cent + 2 points 49 per cent 
“Mental” = 21 per cent + 2 points 47 per cent 
“Social”? = 18 per cent + 2 points 42 per cent 


On the ‘‘ Manual”’ Seaie, sixty-three cases had the same rating in both tests; 
forty-four cases had within plus or minus one division of the same rating in 
both tests, and one hundred seven cases had from minus one to plus one of the 
same rating in each test, 7.e., 80 per cent of cases were in the same division, or 
plus or minus one of the same division in both tests. 
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On the “‘ Mental”’ Scale, forty-four cases had the same rating in both tests; 
fifty-one had within plus or minus one division of the same rating in both tests, 
and ninety-five cases had from minus one to plus division of the same rating 
in each test, 7.e., 69 per cent of cases were in the same division, or plus or 
minus one of the same division in both tests. 

On the “Social” Scale, forty-eight cases had the same rating in both 
tests; seventy-six cases had within plus or minus one division, and ninety-one 
cases had from minus one to plus one of same rating in each test, 7.e., in 86.5 
per cent of cases the rating fell within the range of minus one division to plus 
one division of the rating obtained in the previous test. 


The results reported in the above statements indicate that the rating 
secured on one administration of the test is likely to be repeated on a 
second administration twelve months later. The numerical scores 
received may not be the same, for the population as a whole has 
changed, but the individuals comprising the group do not greatly 
change in their relations one to the other. In other words, a boy 
securing a certain rating in ‘“‘manual”’ or ‘‘social”’ interests is likely to 
secure within one division of that rating when retested, 7.e. interests in 
the occupational group remain relatively constant over a period of 
twelve months. Whether this permanency of interest persists for a 
longer period is a matter for further investigation. 

It may be considered that the above tentative conclusion omits to 
take cognisance of the effect of memory or of the reliability of the test. 
Both these factors were investigated. 

In the first administration of the test, no mention was made of the 
proposal to re-administer the enquiry and, in the second test, no prep- 
aration was given and no steps taken to refer to the previous question- 
naire. Further, papers were collected immediately the children had 
finished. If the same reply was given in the second test as in the first 
test, are not the chances strongly in favor of it being a genuine expres- 
sion of interest, rather than the result of memory? When the thirty 
stimuli Are considered, the possibility of the child remembering his 
responses to all and repeating the results is remote. 

The reliabilities of the ‘‘manual” and “‘social’”’ scales were deter- 
mined by the Spearman-Brown formula from odd-even ratings. The 
correlation between the scores for ‘“‘manual”’ interests on the two halves 
of the test, using odd and even items, was r = .88 + .03. This gives 
the reliability of the ‘“‘manual” scale to be .94. Similarly for the 
“social” scale, the correlation between the two halves was 
r= .95 + .02, giving the reliability to be .98. 
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The scores for the ‘‘mental”’ scale were not calculated on account 
of the small number of items and the irregular distribution of scores. 

These reliability coefficients are considerably higher than those 
determined by the re-test technique. This result is not unexpected, 
since the second administration of the test is not so much a measure 
of the reliability of the test as a measure of the permanency of interest 
in a specific field. As one hundred per cent permanency of interest 
has not been recorded, the correlation coefficients given on page 530 can- 
not be accepted as “‘reliability coefficients,’ since in addition to errors 
due to the imperfections of the instrument, the variations due to altera- 
tion of interest are also grouped in this figure. The coefficients of 
page 530 are much less than the true values, these being the coefficients 
indicated on page 531. 

The above data indicate high internal consistency for the test and 
demonstrate the soundness of the new procedure. 

The determination of the validity of the test had to be carried out 
by indirect methods. It was assumed that, when summed together, 
the items of the test would represent the true feeling attitude of the 
examinee. To discover, though, actually what the test did measure 
cannot be answered completely from the data at present available. An 
interpretation of the scores made on the test can be secured by noting 
the occupational preference of the child, by the occupation he enters, 
and by the type of school he enters. It was assumed that boys with 
“‘manual” interests predominating weuld enter schools with a 
“‘manual”’ course (junior technical schools), and that boys with 
“‘social”’ and ‘‘mental”’ interests in the ascendant would be likely to 
go to schools of a ‘‘non-manual”’ type (secondary schools). Such a 
division as this is somewhat arbitrary, for one pupil may secure high 
ratings in all three scales, but if pupils with higher ‘‘manual”’ than 
“‘social’’ scores are found in junior technical schools and those with 
higher “‘social’”’ and ‘“‘mental”’ scores are found in schools taking a 
secondary course, then the scores are of prognostic value and the 
test is thus indirectly validated. 

The ‘‘manual”’ ratings of all boys in the junior technical schools 
who had completed the first year and who were included in the experi- 
mental group, and those of similar pupils in the secondary schools were 
obtained and compared with the rating of the same pupil on the 
‘‘social’”’ scale. Seventy per cent of boys in the junior technical school 
group had a higher “‘manual”’ rating than “‘social’’ and 17 per cent had 
a higher “‘social’”’ rating than “manual.” For the secondary school 
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boys, 60 per cent had a higher “‘social’”’ rating than “‘manual’”’ and 
30 per cent had a higher ‘‘manual”’ rating than “‘social.”’ This indi- 
cates that the test as a whole does differentiate between junior technical 
and secondary school boys on the basis of their occupational interests 
and, therefore, is a valid instrument for the determination of interests 


PERCENTAGE OF RESPONSES TO PicturREs (Boys) TEest RESPONSES OBTAINED 
Prior TO ENTRANCE TO SCHOOL 











Junior technical boys Secondary school boys 
Item No. 
Yes per cent No per cent Yes per cent No per cent 
1 46 46 35 48 
2 42 30 24 52 
3 15 67 15 67 
4 21 60 24 59 
5 12 73 11 63 
6 9 60 7 63 
7 18 58 5 60 
8 58 30 37 43 
9 24 64 17 54 
10 33 52 26 48 
11 3 70 26 48 
12 27 46 28 52 
13 9 58 22 48 
14 42 48 54 28 
15 15 73 22 67 
16 15 79 7 76 
17 18 76 7 71 
18 36 48 15 63 
19 15 73 30 37 
20 24 55 30 41 
21 15 70 17 67 
22 12 55 37 43 
23 18 60 17 59 
24 21 64 15 65 
25 ° 12 70 9 71 
26 55 33 30 48 
27 24 60 24 50 
28 18 52 37 39 
29 15 70 13 52 
30 - 88 2 67 

















Note: The sum of “‘ Yes per cent”’ and ‘‘No per cent”’ does not reach 100 per 
cent. The differences are due to ‘‘No entry,” ‘“‘Do not know,” and ‘Do not 
mind” responses. 
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in ‘‘manual”’ or ‘social’? occupations. (The “mental” ratings were 
not included on account of an insufficient number of items to give a 
reliable separation.) 

The test as a whole includes common interests. If items which are 
found to be present in both technical and secondary groups to the 
same extent were removed, a simpler testing device and at the same 
time a more selective one would be available. The responses of the 
junior technical and secondary groups to each symbol on the question- 
naire were obtained with a view to the elimination of non-selective 
items. The percentage of responses to each symbol for each group is 
given in the table on page 533. 

Ream’s method of scoring was employed. Where the difference 
between the responses of the two groups to any symbol was greater 
than the standard error of the difference, the symbol marked by the 
junior technical group was given +1 and the reverse score for the 
secondary group was given —1. Yule’s formula 
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where E;,?_ is the standard error of the difference that is sought 
P, the per cent of one group-eircling the item, 
Q, the per cent of this group not circling the item, 
P, the per cent of the other group circling the item, 
Q. the per cent of that group not circling the item, 
N,and Ne the numbers of the individuals in the two groups 
respectively 
was used to determine the standard error. 

Significant differences were found for pictures two, eight, eleven, 
fourteen, eighteen, nineteen, twenty-two, twenty-six, twenty-eight, 
twenty-nine, and thirty. These were then used in a new scoring key 
and the other items disregarded as non-significant. 

The papers of the group were re-marked and the scores noted. 
The critical score for the group was +2. Below this score, interest 
was found to be in the direction of secondary courses, and above this, 
the interest was found to be in junior technical work. Using this as the 
dividing line, twenty-nine of the thirty-three boys in the junior 
technical group were correctly placed, and thirty-two of the forty-six 
boys in the secondary group were correctly placed, 2.e., eighty per cent 
of the junior technical and seventy per cent of the secondary school 
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pupils were correctly selected on the basis of their interests or, com- 
bining, seventy-seven per cent of the pupils were correctly classified 
by the test. This result indicates a high validity for the scale and also 
suggests the possibility of the use of this test in selecting pupils for 
post-primary courses of varying types. 

In order to determine both the permanence of interest as measured 
by the revised inventory and the correctness of the selection on the 
basis of the recommendation made on the results of the first test, the 
second test papers of each boy were rescored using the revised basis of 
marking. Seventy per cent of the boys in the junior technical group 
and eighty-three per cent of the boys in the secondary group were still 
correctly classified. When these two results were combined, it was 
found that seventy-seven per cent of the pupils were still correctly 
placed by the test. Sixty per cent of pupils in both groups had been 
correctly selected by the two tests. 

These data confirm the previous conclusion, indicating a high degree 
of permanence of interest in the ‘‘manual”’ or “‘social’”’ group of occupa- 
tions and also strengthens the estimate previously given that the test is 
a valid measure of interests in ‘‘manual”’ or “‘social’’ occupations. 

Before the test can be employed with assurance, it must be shown 
that it will select pupils from other groups. It may be that the original 
group of boys possessed common interests that are absent from other 
groups; consequently, the re-application of the test is necessary before 
the principle can be accepted as sound. 

The test papers of another group of fifty-five boys who proceeded 

, to secondary or junior technical schools were marked by the revised 
scale. It was found that sixty per cent of the junior technical school 
boys and eighty per cent of the secondary school boys were correctly 
designated by the test or, on the total population, seventy-three per 
cent were correctly classified by their ‘“interests”’ as measured by this 
scale. 

The pictures test is, therefore, one which does distinguish between 
pupils suited for secondary school work and those fitted for junior 
technical work. It is not claimed that this test should be the only 
criterion to be employed, but it is considered that the use of a testing 
device of the kind described will give valuable additional data in the 
distribution of pupils between various courses at the end of the primary 
school course. The improvement of the test by the rejection of non- 
discriminatory items and its extension by the inclusion of further valid 
items will form an additional study. 
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WHAT DOES THE ‘‘PICTURES”’ TEST MEASURE? 


It has been shown that the “ pictures’’ test will distinguish between 
junior technical and secondary school pupils when the nature of their 
responses to the various stimuli were considered. But is the test a 
measure of interests, or is it a test of abilities masquerading in another 
guise? The test of interests devised by the National Institute of 
Industrial Psychology for use in its London experiment apparently 
produced only another measure of general intelligence. Was this test 
likewise a measure of either general or special abilities, or did it explore 
a portion of the personality not closely related to these? 

Correlations were calculated between the boys’ scores in an intelli- 
gence test and the manual interest, mental interest, and social interest 
ratings, and also between the scores on a mechanical ability test and 
the manual interest ratings. The intelligence test used was the 
Teachers .College Group Test (devised by the Melbourne Teachers’ 
Training College) and the mechanical ability test employed was the 
Minnesota Paper Form Board, Series A. Correlations are given in the 
following table: 


CORRELATIONS BETWEEN INTEREST RATINGS AND ABILITY SCORES 
Intelligence Test Score and Manual Interest Score 
N = 210 r= +.10 + .04 
Intelligence Test Score and Mental Interest Score 
N = 212 r= +.03 + .04 
Intelligence Test Score and Social Interest Score 
N = 210 r= +.05 + .04 
Mechanical Ability Test Score and Manual Interest Score 
N = 209 r= +.04 + .04 


The above data show that the “pictures” test is not a measure 
of general ability, but of some specific factor with which ‘‘g’”’ has 
practically zero correlation. The evidence so far suggests that the 
test is a valid reliable measure of interests in ‘‘manual”’ or “social” 
occupations. 
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QUALITATIVE WHOLES: A RE-VALUATION OF THE 
WHOLE-PART PROBLEM 


MAY V. SEAGOE 
University of California at Los Angeles 


The possible difference in efficiency between learning by units as 
opposed to learning by segments has received much attention in psy- 
chological literature, yet nowhere has there been a definition of terms. 
Detailed reviews have appeared previously, notably that of 
G. O. McGeoch, and repetition would be fruitless. Certain general 
trends of investigation and of result may, however, be mentioned. 

The presence of definite periods of interest in this problem together 
with the evolution in types of experimental material are indicated in 
Fig. 1. The early period of investigation following the pioneer work of 
Steffens with verbal material is indicated, together with the decreased 
interest for more than a decade. Then, with the introduction of a 
different type of material and a different result by Pechstein, interest 
rose rapidly. The height of the curve is governed by the number of 
studies reported during each five-year period. 

Figure 2, indicating the result at the various stages, shows still more 
clearly the controversial nature of the issue. The pendulum swings 
from early unanimity for the whole method to a preponderance of 
evidence for the part method, and back again part of the way to an 
inconclusive position. The comparison between date of introduction 
of new types of material and date of reversal of experimental finding is 
of interest. Recently there has been a tendency to state results in 
terms of specific conditions, the most economical method varying with 
these conditions. 

Of the experiments covered, nineteen used the memorization of 
poetry, seven used serial nonsense syllables, seven used paired associates 
of various types, five used mazes, two used puzzles, two used the reading 
of content, two used piano-playing, and one each used finger sequences, 
mirror drawing, card sorting, letter-number substitution, square- 
tracing, geometric figures, social science assignments, typing, language, 
and shorthand. In no case where more than one investigator has 
worked with the same material has there been complete agreement 
as to result. 

There is fair agreement among the authors that the part method is 
the spontaneous and preferred procedure, that increased age on the 
part of the subjects favors the whole method, that high intelligence 
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favors the whole method, that the whole method is more economical 
| with continuous, meaningful, logically related material and the part 
: method with material of very unequal difficulty, and that massing of 
; practice weights the situation in favor of the part method. There is no 
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greater individual variation, and what the relative merits of the 
methods are under conditions of fatigue. 

There are certain general criticisms which may shed light on the 
reason for the diversity of results. (1) Nowhere is there a definition of 
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“whole” or ‘“‘part”’ except in terms of length of unit. The studies 
have been, therefore, largely studies of attention-span and memory- 
span rather than studies of the whole-part problem proper. (2) 
There is no agreement in definition of ‘‘economy of learning.”” Some 
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experimenters used immediate recall, some delayed recall, some base 
their conclusions on errors, some on time scores. In the non-verbal 
laboratory experiments the failure to check retention is particularly 
marked. (3) Most of the procedures emphasize the latter or habitua- 
tion section of the learning curve at the expense of the earlier or 
perceptual part. It is with the perceptual section that modern educa- 
tion is primarily concerned. (4) In many cases there was lack of 
proper experimental control. Small populations, failure to separate 
the problem from related factors, failure to equate practice, and similar 
shortcomings are reflected in the relatively small number of studies 
which have so far yielded statistically reliable results. 

Recently the problem of units of presentation for learning has 
gained increased significance because of two developments, one in the 
field of psychology and the other in the field of educational method. 
The first is the concept of wholeness contributed by Gestalt psychology, 
the second the so-called progressive education movement which 
acknowledges John Dewey as its leader. Perhaps a brief statement of 
the concepts of the learning process of these two groups will clarify the 
issue. 

For Dewey, learning is problem-solving. It is active, not passive; 
it does not consist merely in the acquisition of a body of information or 
the mastery of skills. It is based on the belief that ‘‘All which the 
school can or need do for pupils, . . . is to develop their ability to 
think,”! and that thinking includes “‘ . . . the sense of a problem, the 
observation of conditions, the formation and rational elaboration of a 
suggested conclusion, and the active experimental testing.”? Learning 
is ‘‘ . . . intentional purposeful activity controlled by perception of 
facts and their relationships to one another.’”* He has even gone so far 
as to claim “ ... psychological investigations have proved that 
learning is better and faster when the learner understands his problem 
as a whole... ’4 in spite of the confused state of the whole-part 
issue. 

The Gestalt psychologists are also interested in learning as problem- 
solving, not with learning as the habituation of response whose first 
appearance is accounted for in terms of chance. ‘‘The nature of 





1 Dewey, John: Democracy and Education, p. 179. 

2 Ibid., p. 177. 

3 Tbid., p. 120. 

4Dewey, John: ‘‘Why Have Progressive Schools?” Current History, Vol. 


XXXVIII, 1933, p. 446. 
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mental development ... is not the bringing together of separate 
elements but the arousal of more and more complicated configura- 
tions.”' Learning includes more than memory; it involves not only 
finding out how the successive performances depend upon the first, but 
also how the first comes about. Insight is the keynote: one does not 
profit by experience unless that experience leads to insight. The 
learning process is conceived as the resolution of tensions set up in the 
learner because of the presentation of a problem. Learning involves a 
reorganization of the perceptual field so that the configuration is seen as 
leading toward a goal. 

In spite of the confusion surrounding the whole-part problem and 
the complexity of the factors involved, the issue is still an important 
one. Further, Gestalt psychology offers a new approach. Granting 
that no one type of presentation may be superior for all ages, all intelli- 
gence levels, all purposes, or all materials, it still seems that there may 
be some criterion for the selection of material which might yield signif- 
icant results. With a relatively homogeneous group and with problems 
designed to meet a clear-cut definition of what constitutes a qualitative 
whole, there is still the possibility of discovering a fundamental 
principle. 

What, then, is a qualitative whole? The best descriptions are 
given in the literature of the Gestalt psychologists. Koffka defines a 


whole as “ . . . a coexistence of phenomena in which each member 
possesses its peculiarity only by virtue of, and in connection with, all 
the others...’ K@6hler gives to it the qualities of clearness, 


organization, definiteness, entity and individuality, regularity, sym- 
metry, and solidity. A ‘‘Gestalt means any segregated whole.’ 
There is functional coherence of structure and process. Form is the 
most important visual property of a whole, meaning automatically 
producing form where there was none before. According to Wheeler, 
any whole possesses a unity which is lost when it is reduced to its parts.‘ 
Ellis interprets the term whole to be “ . . . a name used to signify 
that which is pre-analytically what it is disregardful of what a post- 
analytical discrimination may reveal to be its parts.”® Higginson 
describes a whole as an integrative, unitary structure whose parts 





1 Koffka, Kurt: Growth of the Mind, p. 356. 

2 Koffka, Kurt: Op. cit., p. 131f. 

3 Kohler, Wolfgang: Gestalt Psychology, p. 192. 

‘ Wheeler, R. H.: Science of Psychology, p. 153. 

5 Ellis, W. D.: Gestalt Psychology and Meaning, p. 145. 





ete oe Re oe | 
oe er ees 2 2 
eS SAS Tetaee is cae : 


ma: * 


542 The Journal of Educational Psychology 


gather significance simply from their membership in a functional out- 
come.'! Helson says a Gestalt is a relatively independent system, 
isolated and autonomous. It is not explainable by parts and their 
relations; the whole is suprasummative. The whole may imply the 
parts, but the parts do not imply the whole. A homogeneous field may 
not be called a whole because halving the field leaves the whole 
unchanged. ‘If configurational structures possess specific properties 
which are supra-summative and belong to them as wholes, then these 
properties are lost in analysis . . . Since the whole possesses its own 
specific properties we can never tell in advance from a knowledge of 
the parts what the whole will be or how it will behave.’? Such con- 
figurations are governed by internal laws “‘ . . . as opposed to the 
summative, contingent, spatio-temporal contiguities of meaningless 
elements. Only in this way can the whole be explained as a rational, 
meaningful structure.’’* ‘‘A description of parts in relation does not 
reproduce the configuration with its specific properties as a whole.’’ 
Parts cannot be isolated from wholes and remain true parts.° 

A whole, then, in the Gestalt sense, is an organic structure, not a 
multiplicity of atomic structures such as that represented by a list of 
nonsense syllables. Analyzing the available definitions, a whole 
seems to have three primary qualities. (1) It is definitely segregated. 
It is relatively independent, isolated, and autonomous, possessing its 
own characteristic individual entity. Other wholes are excluded from 
existence for the learner in the perception of a true whole. (2) It 
possesses ‘‘form-quality,” or unity built around a central function. 
It is an organized, functional unit with decisive internal dynamic 
relations. It possesses clearness, definiteness, solidity, regularity, 
harmony, coherence and symmetry. (3) It is more than a sum of 
its parts; it is a rational structure. Hence a homogeneous field cannot 
constitute a whole, because taking out one part of it leaves the whole 
unchanged. A whole is causally coherent, a unit in which changing one 
section changes the entire structure. 

Going back to the wide variety of types of material that have been 
used, one cause of disagreement is apparent. Problems of a largely 





1 Higginson, G. D.: ‘‘Visual Perception in the White Rat.’”’ Journal of 
Experimental Psychology, Vol. IX, 1926, pp. 337-347. 

?Helson, H.: ‘‘Psychology of Gestalt.’”’ American Journal of Psychology, 
Vol. XXXVI, 1925, p. 347. 

3 Ibid., p. 347. 

4 Tbid., p. 356. 

5 Tbid., p. 361. 
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rote or mechanical nature would possess little form, hence few of the 
attributes of wholes. Poems, content, and the like might or might not 
constitute wholes, depending entirely upon their internal organization. 
Comparatively few of the materials meet the Gestalt definition of a 
whole. 

It is also apparent that, with a definition of wholeness in qualitative 
rather than quantitative terms, the relationship between the whole- 
part problem and such fields as context versus list presentation, logical 
versus rote learning, the influence of meaningful organization, problem- 
solving and the like is definitely increased. 

Elsewhere in the literature there will follow two series of laboratory 
experiments and one series of classroom experiments designed to deter- 
mine whether material designed to meet the Gestalt definition yields 
any light on the whole-part problem. 
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THE ROLE OF THE BASAL METABOLIC RATE IN THE 
INTELLIGENCE OF NINETY GRADE-SCHOOL 
STUDENTS 


RALPH T. HINTON, JR. 
Northwestern University 


It is rapidly becoming more and more apparent through the 
researches of recent years that the basal metabolic rate is one of the 
most important factors in the life cycle of man. Experiments and 
clinical observations have shown what its effects are physically, but 
comparatively little is known about the mental side of the problem, 
except, of course, in abnormal cases. 

The present problem grew out of the opinion of medical acquaint- 
ances of the author that a low metabolic rate in children is associated 
with a low mental level. It consists in a quantitative determination of 
the relationship between the basal metabolic rate and intelligence in 
children who were negative to any pathological conditions. 


PROCEDURE 


In all, ninety subjects were used, their ages ranging between five 
and fifteen years. There was no conscious selection of cases, and the 
only requirement was that each child pass a very thorough physical 
examination. This was done to eliminate any factor other than a 
thyroid disturbance that might have a bearing on the results. It is 
universally recognized that, other things being equal, the basal meta- 
bolic rate is one of the most reliable criteria of the activity of this 
gland. In this way it was hoped that the results would become more 
meaningful than if several other factors were involved. 

The conditions of the experiment itself were quite simple. Thirty 
of the children came from an orphanage in Evanston, Illinois; they were 
kept in bed the morning of the testing until the arrival of the examiner, 
when they were taken by car to the psychology building of North- 
western University, where once more they were put to bed for an hour 
and a half. Following this rest period, the basal metabolisms were 
taken, these consisting of three tests for each child, the first of which was 
discarded and an average of the last two being considered as the b.m.r. 
After this, the Stanford-Binet and Arthur Point-Performance mental 
tests were given. 

The remaining sixty children came from private schools in Kankakee 
and Manteno, Illinois. In these instances, the subjects were kept in 
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bed until the arrival of the examiners and the tests took place there. 
The procedure here was further varied by the fact that two assistants 
took care of the metabolisms and performance tests, while the author 
administered the Binet; in Evanston, it had been necessary for him to 
do all the examining. In every case, however, in order to insure true 
basal conditions, no breakfasts were allowed the children. 

In the following tables appear the results of this investigation. 


TaBLE I.—THE CorRRELATIONS, MEANS, AND STANDARD DEVIATIONS FOR THE 
ToTaL GRouP AND THE MEANS FOR THE SEPARATE SEXEs FOR (1) BINET, 
(2) ArTHUR PERFORMANCE AND (3) METABOLISM ScorREs 



































Total group N90 Males N43 | Females N47 
l 2 3 M og M M 
1 ite Bike eae ee 15 .43 103 .13 100.10 
2 . 7 ee ee 99 .00 18.11 101 .05 97.13 
3 .736 | .661 | .... | —5.85 10.70 —5.35 —6 .30 
DISCUSSION 


This is but a preliminary study, and the author well realizes that, 
with the small number of present cases, any sweeping statements on 
the problem would be unwarranted. But if the Binet and Arthur 
Point-Performance mental tests measure that elusive thing known as 
intelligence, and it is generally agreed that they do, then the results 
of this investigation would indicate that there might be some sort of a 
relationship between metabolism and intelligence, at least as far as 
children are concerned. However, when one considers that the group 
of subjects was entirely unselected, the results assume more meaningful 
proportions. Consequently, we do not feel it an over-statement of the 
importance of the results in saying that basal metabolism does play a 
part in the mental activities of the so-called normal person who is 
negative to any pathological conditions other than an advanced or 
decreased thyroid state. 

This naturally brings up the point why we were so careful to elimi- 
nate factors other than thyroid disturbances. If subjects had been 
used regardless of their physical health, the same distribution of 
metabolic scores might have been secured, but they might have meant 
something quite different. It is known that other conditions have a 
bearing on this rate, but it is also known that no other disturbance 
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gives the same basal readings day after day, and month after month; 
these other factors cause a fluctuation rather than allowing it to remain 
constant. This being so, an unexamined child would not have given a 
true indication of the relation between the basal metabolic rate and 
intelligence, and we would have been correlating changing against 
stable scores. Therefore, selecting only those subjects who had no 
other afflictions, we were able to compare a permanent mental char- 
acteristic, the IQ, with the only factor which gives a constant meta- 
bolic rate—the activity of the thyroid gland.'? 

It was not so much the fact that low basal readings were associated 
with low intelligence, which made the results interesting, as it was that 
the lowest metabolism score did not give the lowest IQ, nor did the 
highest positive metabolism give the highest IQ. Moreover, above and 
below certain points the basal ratings and intelligence scores tended to 
be more or less clear-cut and well-defined, but in the middle ranges 
there tended to be a great deal of overlapping. Above IQ one hundred 
twenty on the Binet tests there was not one minus metabolism, and of 
these ten, only four were outside the normal basal limit of plus ten; 
and the Performance results showed only one minus metabolism, and 
that outside the normal limits, associated with scores of one hundred 
twenty or above. Yet of these nine scores, only two were above the 
normal limit of plus ten. Below the normal grouping of ninety to 
one hundred ten, the results were just as well proportioned. There 
was not one Binet IQ with a plus basal in this section, and only one 
Performance IQ on the positive side, although in both instances the 
basal ratings extended far into the subnormal territory. However, 
between IQs ninety to one hundred ten on both tests, the great over- 
lapping came. Minus and positive basals were almost equally 
numerous, but only in one case, and that a Performance score, was the 
metabolic rate below minus twenty. 

From all this it can be inferred that basal metabolism correlates 
fairly closely with the IQs, with positive and negative scores in the 
normal and slightly above normal limits. This, in part, would seem to 
agree with the clinical pictures of physical activities associated with 
metabolism disorders; the slow, underactive type that results from a 
low basal metabolism, and the overactive personality that goes along 
with a positive metabolic rate. 





1Crile, G. W.: The Thyroid Gland. W. B. Saunders Co., 1922. 
* Kendall, E. C.: Thyroxine, The Chemical Catalog Co., 1929. 
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However, there are two discrepancies to be noted in these results. 
In the first place, there were several cases that scored IQ’s between one 
hundred ten and one hundred twenty on both tests and yet had minus 
metabolisms that ranged all the way from slightly subnormal to well 
below the normal limit. The author knows of no definite explanation 
for this, but the most obvious reason is that the various metabolic 
disturbances had not been of long enough standing to be recorded in the 
person’s mental activities. This would be borne out by the fact that, in 
spite of the great extent of information about thyroid disorders, no one 
knows either how long a condition has been going on (unless, of course, it 
reaches very abnormal proportions), or when the gland is going to start 
to underfunction or become hyper-active. 

Another unusual occurrence is found in the two correlations, the 
Performance being lower than the Binet. This is peculiar, because one 
of the symptoms of metabolic disorders is a heightened or lowered 
physical response, and one would think that, of the two tests used, the 
Performance would illustrate this more clearly. Perhaps the explana- 
tion just discussed would hold here; or perhaps the metabolic disturb- 
ances of the subjects in question were not severe enough to be recorded 
in physical activity. It may also be that mental faculties are more 
susceptible, and, as such, are the first to be upset. But this is merely 
speculation. 

A further point of interest lies in the fact that several times during 
the course of the experiment correlations were made between the scores, 
each time with an increased number of cases, and the coefficients were 
as high as the final results. 


CONCLUSIONS 


(1) This is a study of the quantitative relation between the basal 
metabolic rate and intelligence of grade-school students, who were 
negative to any pathological disorders other than thyroid disturbances. 
A further survey will be made later. 

(2) In the ninety cases of school children, between the ages of five 
and fifteen years, presented in this experiment, the correlation of 
.736(PE,.032) between the results of the basal metabolic tests and the 
Binet mental tests scores, and the correlation of .661(PE, .040) between 
the results of the Arthur Point-Performance Scale and the results of the 
basal metabolic rates (as computed on the Thurstone Correlation 
Data Sheet), indicate that there might be a positive connection 
between thyroid gland activity, as measured by the basal met- 
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abolic rate, and intelligence in persons who are negative to any 
pathological conditions. 

(3) The distribution of the basal metabolic rates and intelligence 
tests scores presented here indicate that the limits for best mental 
activity would seem to be between 0 and plus ten metabolic rates. 
By the same distributions, the limits for lowest mental activity appear 
to be between minus five and minus twenty-three metabolic rates. 

(4) The distribution of the basal metabolic rates and intelligence 
tests scores presented here would seem to indicate that between the 
normal mental limits (IQs ninety to one hundred ten) there is a great 
spreading and overlapping of basal metabolic figures. 

(5) It would seem from these results that there is less connection 
between basal metabolic rates and motor intelligence (as measured by 
Performance test) than there is between basal metabolic rates and 
abstract intelligence (as measured by the Binet test). 

(6) The general average of the basal metabolic rates is inclined to 
be slightly on the minus side of the metabolism chart, somewhere in the 
neighborhood of minus five. 

(7) The general average of intelligence tests scores and basal 
metabolic ratings for girls, as presented here, is slightly below that for 
boys, although these differences are so slight as to be almost negligible. 








NOTE ON THE INTERPRETATION OF COEFFICIENTS 
OF CORRELATION 


WALTER S. MONROE 
University of Illinois 


If two variables are correlated, each one may be thought of as 
analyzable into a common factor and a specific factor, 7.e., 


Zo = Coa + do 
Zi = Ca 4. b; (1) 


If x, represents a cause of Zo, coa is the contribution of x; to the variance 
of zo. If.acausal relationship does not exist between the two variables, 
coa is a factor of z» which is perfectly correlated with c,a, the corre- 
sponding factor of z;. In either case thevariance ratio, Co?¢_?/e0?, isa 
measure of the proportion of the variability (variance) of x» that is due 
to the common factor. Less precisely, this variance ratio is a measure 
of the ‘‘contribution”’ of x; to Zo. 

The value of the variance ratio can be studied more easily if the 
above equations are expressed in standard score form. Dividing both 
sides of the first equation by oo, we have 


To Coa bo 
oan oe posi 


—_S——i rrr 


Jo Jo a) 


Multiplying the first fraction of the right-hand member by o</oa 
and the second by o»,/0.,, we obtain 


To _ Coad , ory do 





To 90 Ga T0 Tbo 
By introducing symbols for the fractions, we may write 


Zo = woa’ + wibd’ 





Similarly, 
Zi = va’ + vb,’ (2) 
oat 22? _ mpc 
” N N 
_ [Ll Sa? — far? 
Not N Nav? 
o., = 1.00 
Similarly it can be shown that ¢., = ov = 0, = oo = 1.00 
Also 
ue oe 
N 
= Z(woa’ + Wibo’)(voa’ + by’) 
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Since a’, bo’, b;’ are uncorrelated, the last three terms of the above 
expression are equal to zero. Hence, 





2 
LDwowna’ 
va = N 
ra” 
= Woo 
= WoW a’ ° 
Tor = Wovo (3) 


The variance ratio for standard score equations is wo?eq?/o,,? 
which simplifies to wo?. This expression for the variance ratio may 
also be obtained from the definition of wo as a symbol for coo,/eo. 

The square of the standard deviation of the sum of two uncorrelated 
variables is equal to the sum of the squares of the standard deviations 
of the two variables. Hence, 


Cx = Woda” + W170," 
Tx," = V0'Ga” + 01700," 


Since all of the o’s are equal to unity, these equations simplify to 


1 Wo? + w;? 
1 = v9? + 0;? (4) 


By squaring (3), the expression for the coefficient of correlation, 
we have, 


Toi? = Woo” (5) 


Since there are four variables and only three equations, no general 
solution is possible, but the value of wo? may be obtained for the case 
where vp = wo and for the case where vp = 1.00. When v,? = 1.00, 
Wo? = 79:2. This is the minimum value of the variance ratio for a 
given coefficient of correlation. If vo? = wo?, (wo?)? = roi? or wo? = 
Toi. If vo? is less than wo’, wo? is greater than ro. This follows imme- 
diately from the relationship woo? = ro:7._ If v9? is replaced by w»’, 
a greater value, then (wo?)? is greater than ro:? and w,? is greater than 
Tou. Since wo? + w,? = 1.00, the maximum value of wp? is 1.00. 
Hence, the limits of the value of the variance ratio, wo”, are ro.? and 
1.00. Fora given coefficient of correlation, say ro, = .60, the per cent 
of the variance of the variable taken as dependent due to the common 
factor may be as small as 36 or as large as 100, and the value depends 
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upon the factor pattern of the two variables. Hence, the interpreta- 
tion of a coefficient of correlation in terms of the communality of the 
two variables requires that the nature of the factor pattern be deter- 
mined or that the interpretation be based upon an assumed factor 
pattern. 

In considering the application of the partial correlation technique, 
it has been suggested that chronological age ‘‘contributes itself com- 
pletely” to certain variables. In other words, when the independent 
variable is chronological age, it appears reasonable to assume, at least 
in certain cases, that v, = 0.00, and hence that vo = 1.00. Under this 
condition, the value of the variance ratio would be equal to 7o;2._ For 
example, if for a given population the correlation between achievement 
test scores and chronological age is .40, then .16 of the variance of the 
scores is ‘‘contributed”’ by chronological age. When the correlation 
is between the scores yielded by equivalent forms of a test, z.e. if we 
have a coefficient of reliability, wo may be assumed to be equal to vp. 
Hence, in this case wo? is equal to ro:. This assumption is probably 
approximated in the case of the scores on a group intelligence test and 
those yielded by a general achievement test. 

lf we consider the correlation between measures of teaching success 
and scores on a general intelligence test, it probably would not be 
unreasonable to assume that only a minor phase of intelligence con- 
tributes to success in teaching, 7.e., that v,; is materially greater than vo. 
For example, in a given population of teachers the factor pattern 
might be 

20 .80a aa .60bo 
z, = .20a + ~/.96b; 


For this factor pattern ro,= .16 which is a fairly typical value for 
these variables. The value of the variance ratio, wo”, is .64. This 
means that although only a minor phase of what the intelligence test 
measures ‘‘contributes’”’ to the measures of teaching success, this 
phase produces .64 of the variance of the measures. This assumed 
factor pattern may not be valid, but the illustration makes clear the 
possibilities in interpreting low coefficients of correlation. 





11It is unfortunate that the limits of the variance ratio have been given as ro.? 

and ro. The author of this note is guilty of perpetuating this error. See Monroe, 
W. S. and Stuit, D. B.: “‘The Interpretation of the Coefficient of Correlation.” 
Journal of Experimental Education, Vol. I, March, 1933, pp. 186-203. 

For earlier incorrect statements, see Dunlap, J. W. and Cureton, E. E.: “On 
the Analysis of Causation.” Journal of Educational Psychology, Vol. XXI, 
December, 1930, pp. 673-675. 

Tryon, R. C.: ‘‘The Interpretation of the Correlation Coefficient.” Psycho- 
logical Review, Vol. XXXVI, 1929, pp. 419-445. 


NOTE CONCERNING GROUP INFLUENCE UPON OTIS 
S.-A. TEST SCORES 


WILLIAM C. F. KRUEGER 


Wayne University, Detroit, Michigan 


This experiment was designed to note the influence of the group 
upon Otis S.-A. test scores. Usually the test is given to a group of 
students, and occasionally it is necessary to give the test later to those 
who happened to be absent at the time the group took the test. It 
was of interest to observe whether the presence of the group had any 
influence upon the individual test score. 

Farnsworth! found that the group influence was negligible when the 
test was given toasmall group. Since only twenty subjects were used 
by Farnsworth, it seemed worthwhile to study this problem with larger 
groups. One hundred and sixty college students, mostly sophomores, 
participated. The subjects were divided into four sections of forty 
students each. Forms A and B of the Otis S.-A. Higher Form were 
used. 

The group tests were taken in the large recitation rooms, while the 
individual tests were given in the smaller laboratory rooms. Members 
of Section I were given Form A as an individual test, then took Form B 
together as a group. Members of Section II took Form B first indi- 
vidually, then Form A as a group. Section III took Form A as a 
group first, then later Form B individually; and Section IV took 
Form B as a group test first, and then Form A individually. By using 
this method of counterbalancing it was possible to determine practice 
effects. When taking the group tests, the subjects were seated in a 
manner to make copying impossible. The students were not informed 
of the purpose of taking the tests. They considered that it was 
regular routine work for the beginning course in Psychology. 

Four groups of scores were obtained: (a) Individual test scores 
before group tests; (b) individual scores after group testing; (c) group 
test scores before individual testing; and, (d) group test scores after 
individual testing. All averages are based upon eighty individual 
scores. 

When the individual tests were taken first, the average score was 
60.93, th the SwD 16.52, and the PE .33. When the individual test 
followed the group testing, the average test score was 65.63, with the 





1Farnsworth, P. R.: ‘‘Concerning So-Called Group Effects.” Journal of 
Genetic Psychology, Vol. XX XV, 1928, pp. 587-594. 
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SD 5.52 and the PE .29. The difference between these two means was 
4.70. This increase in test score may be attributed to practice effect. 
The reliability of the difference between the two means was 10.7, and 
therefore was statistically significant. (This result coincides with the 
increase in test score due to practice suggested by Otis.) 

When the group test was taken first, the average test score was 
61.43, with the SD 6.28 and the PE .32. When the group tests fol- 
lowed the individual tests, the average score was 65.21, with the SD 
5.37 and the PE .27. The difference between these two means was 
3.78. This increase may also be ascribed to practice effects. The 
reliability of the difference between the two means was 9.00, and 
therefore was statistically reliable. 

When we calculated the difference between the average for the 
individual tests and the average for the group tests, when both tests 
were the initial tests, the groups tests yielded a slightly higher average. 
However, it must be noted that the difference of .50 is of little signif- 
icance since the reliability of the difference between the two means is 
only 1.09. 

When the individual and the groups test both followed the initial 
tests, the average for the individual tests was slightly higher. This 
difference of .42 was also of little significance since the reliability of the 
difference between the two means was only 1.05. 

Our results apparently justified the conclusion that the presence of 
the group had little influence upon the Otis S.-A. Higher Form tests. 
Thus, persons taking the tests separately from the group would get 
about the same test scores as though they had been with the group. 





ERROR IN “ERROR IN THE USE OF THE STANDARD 
ERROR” BY W. R. VAN VOORHIS 


J. D. LEITH 


University of North Dakota 


In the March, 1935, issue of this Journal, page 228, W. R. Van 
Voorhis speaks of misuse of the formula 





Tm—m, = Vom;? + O ms? ne 27 120 mF ms '(1) 


through ‘‘usage of the standard deviations of the observed samples 
instead of those of the parent populations from which the samples 
were drawn whenever that information is available. Thus, in the 
formula [quoted] the two c,,’s actually involve the standard deviations 
of the total distributions from which the compared samples were 
taken. Furthermore, if the two samples are themselves drawn from 
the same totality, 7; and o2 are never distinct but refer to the one true 
standard deviation.” 

If it be known that ‘‘the two samples are themselves drawn from 
the same totality” there is no need of determining the standard error 
of the difference between sample means, since in this case it is known in 
advance that any observed difference in sample means is accountable 
for on the basis of chance in sampling alone. 

The sole purpose in computing the critical ratio is to measure the 
probability that a difference as large as the observed difference would 
be obtained zf the samples were drawn from the same statistical 
population, in the face of an established presumption (possibly very 
mild) that they are not so drawn. No discussion which rests on the 
supposition of a unit parent population for both samples can apply to a 
situation in which the whole point in question is that of the probability 
of the existence of two distinct statistical populations, one represented 
by one sample; the second by the other sample. 

Consider, for example, a hypothetical experimental situation, 
furnished by two equal groups of jumping frogs, one serving as experi- 
mental group (x), the other as control (c). Let the distribution of 
weights be normal in each case, and suppose the groups to be matched 
in ‘‘man-to-man” fashion, so that the standard deviations are equal 
and (Pearson) r-z = 1. The purpose of the experiment is to determine 
the effect on weight of a meal of buckshot, administered in equal 
quantities to each member of the experimental group. At the con- 
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clusion of the experiment, it is found that each weight (and the mean) 
for the experimental group is greater by some positive constant; there 
is no change in the control group; the standard deviations for the two 
groups are still equal; and again r.. = 1. 

It is clear that formula (1) furnishes a correct result in this case, 
since it reduces at once to give 








Tm—m: = V6 m;? +> OT ms” one 26 mF ms = V (6m, ae T ms)” = 0 


Thus, whatever the observed difference in means, the critical ratio is 
infinite. This result corresponds to a state of certainty that the 
observed difference is statistically significant, z.e., that the experimental 
and control groups, at the end of the experiment, do not represent 
equivalent statistical populations as they did (equally) at the 
beginning. 

On the other hand, Professor Van Voorhis’ formula (4) yields 
om,—m: * O, resulting in a very conservative critical ratio indeed, 
under the conditions postulated ! 

Or consider such a comparison as that of heights of (a) ten thousand 
Japanese schoolboys of some well-defined classification and (b) a 
corresponding group of English schoolboys. There is no single parent 
population (the human race?) involved here: There are two parent 
populations, statistically speaking. Any effort to obtain an estimate 
of a “true standard deviation”’ for such a fictitious “‘total population” 
composed of ‘‘all cases in both observed samples” must lead to mean- 
ingless statistics. 


ie . 
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A REPLY TO J. D. LEITH 


W. R. VAN VOORHIS 


The Pennsylvania State College 


In March, 1935, issue of this Journal, page 228, some comments 
were made concerning misuse of the formula for the Standard Error of 
the difference between sample means, and certain mathematical 
developments were presented after certain specified assumptions were 
made. In order to point out the theoretical soundness of that sta- 
tistical development, it will be advisable to consider the development 
from the standpoint of the assumptions involved. 

The formula for the Standard Error of the difference between 
sample means 





Tm—m, = Vom; + Tm? somes 27 120 mT ms (1) 


expressed in terms of the standard deviation of the two parent popula- 
tions (¢,, and ¢,,), the number of cases in each drawn sample (n; and 
m2), and the Pearson coefficient of correlation between the two popula- 
tions (ri2) was expressed in the form 








Tp.” 4 Tp” —_ 9p T pF ps (2) 


Cm—m = 12 
Ny Ne V N1Ne 


Under the assumption that the two samples are drawn from the 
same population, in which case ¢,, = op, and riz = 0, formula (2) 
reduced to 





Cm—m, = wade + 3. (3) 


When the samples contain equal numbers of cases, 7.e., 1 = nz = n, 
formula (3) reduced to 


Om—m, = 1.41422. (4) 


Vn 


Formule (1) to (4). yield exact results only when the different 
values called for in them are known. When those values are not 
known, care should be taken to determine whether or not our particular 
problem lends itself to the mathematical treatment of those formule. 
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If a demand be made which fixes the size of any parameter which 
characterizes a sample, that sample fails to be a random one; hence, the 
standard error of the mean of such an arbitrary sample would be 
indeterminate unless assurance were given of the demand made upon 
the means of all succeeding samples. When one sample is drawn at 
random and another is selected to bear a constant relationship to that 
sample, the second sample loses its identity as a random sample and 
becomes merely a qualified variation of the first. It follows, therefore, 
that the second sample cannot represent a population distinct from the 
first except to the slight degree that the demands made upon its 
characterizing features distinguish it from the first sample. It is not 
always true that “the whole point in question is that of the probability 
of the existence of two distinct statistical populations, one represented 
by one sample, the second by the other sample.” 

If two samples be drawn at random from the same total population, 
r12 equals zero, and formula (3) involving the standard deviation of the 
parent population is applicable. But if one sample be drawn at 
random and another sample from the same population be arranged 
so that certain statistical measures are fixed, e.g., ri, = +1 and 
m2 = m, + C, where C is a constant, then it follows that any function 
involving these measures will be affected. In particular the standard 
error of the difference between sample means will merely be the stand- 
ard error of C which, of course, equals zero. Formula (1) involving 
the two o,,’s would be inapplicable because o, for the second sample 
would be undefined even though the use of a ‘‘standard error of the 
mean”’ for the second sample computed erroneously on the basis of 
the standard deviation of the arranged sample would seem to give 
the desired results. Neither would formula (4) be applicable because 
of the failure to satisfy the condition of random selection. 

The Mark Twain ‘‘ Calaveras County” jumping frog experiment set 
forth by Professor Leith does not present two samples drawn from the 
same population. Hence, formula (4) is not applicable. The control 
group of frogs represents a sample drawn from a total population; and 
the standard error of the mean of such a sample has a well-defined 
meaning. The experimental group is arbitrary in arrangement to the 
extent that the standard error of the mean is indeterminate. Conse- 
quently, the standard error of the difference of the means cannot be 
computed by formula (1) as suggested by Professor Leith. The truth 
of this contention can be seen by arbitrarily matching the two groups of 
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frogs so that ri2 equals —1 instead of +1, in which case the correct 
result would not be given by that formula. 

If it be known that two samples are themselves drawn from the 
same totality, it is true that ‘‘any observed difference in sample means 
is accountable for on the basis of chance in sampling alone.” This 
fact, however, does not remove the need of determining the variability 
of differences in sample means, for we are not always certain of the 
nature of the function that characterizes the total population—the 
normal law does not cover all statistical data. It would seem that 
there is as much justification for determining the standard error of the 
difference between means of samples drawn from the same totality 
as there is for determining the standard error of sample means. 
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